linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
       [not found] <3FCD4B66.8090905@us.ibm.com>
@ 2003-12-06  1:29 ` Daniel McNeil
  2003-12-08 18:23   ` Daniel McNeil
  0 siblings, 1 reply; 58+ messages in thread
From: Daniel McNeil @ 2003-12-06  1:29 UTC (permalink / raw)
  To: Janet Morgan
  Cc: Suparna Bhattacharya, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 4992 bytes --]

Janet,

I think I found the problem that is causing the dio reads to get
unintialized data with buffer writes.  i_size is sampled after
i_sem is dropped and after the i/o have completed.  i_size could
have changed in the mean time.  This patch samples i_size before
dropping i_sem.  I'll leave the test running over the weekend to
verify.

Daniel

On Tue, 2003-12-02 at 18:33, Janet Morgan wrote:
> Hi Suparna, Daniel,
> 
> Just wanted to let you know that I seem to be hitting a failure when I
> run the testcase (attached) that Stephen Tweedie wrote to expose the DIO
> races (the testcase issues buffered writes and dio reads).  I'm trying to
> debug the failure, but I know it will be slow going for me.
> 
> I hit the problem when I run multiple instances of the test against
> the same filesystem (I do specify a unique file for each instance of
> the test).   I normally run about 6 instances of the test, e.g.,
> "direct_read_under foo1", "direct_read_under foo2", etc.   The test
> runs in an infinite loop until a data mis-compare is detected.
> I have not been able to reproduce the failure when I restrict each
> instance of the test to its own filesystem.
> 
> I've tried running with the combinations below and I eventually
> get a failure in each case.   I assume all but the last combo includes
> the critical aio/dio fixes for the associated base.
> 
> Combinations tested:
> 
>    Daniel's latest:
>        2.6.0-test9-mm5 +
>        aio-dio-fallback-bio_count-race.patch +
>        direct-io-memleak-fix.patch
> 
>    Suparna's latest:
>        2.6.0-test9-mm5 +
>        Suparna's current patches:
>            
> http://marc.theaimsgroup.com/?l=linux-aio&m=106983304420570&w=2  +
>            
> http://marc.theaimsgroup.com/?l=linux-kernel&m=106904658121299&w=2  +
>        direct-io-memleak-fix.patch
>  
>     stock linux-2.6.0-test9 plus:
>       aio-refcnt.patch  +
>       dio-aio-fixes.patch +
>       dio-aio-fixes-fixes.patch +
>       O_DIRECT-race-fixes-rollup.patch 
> 
>     stock linux-2.6.0-test11 plus:
>       O_DIRECT-race-fixes-rollup.patch
>  
> Thanks,
> -Janet
> 
> ______________________________________________________________________
> 
> #define _XOPEN_SOURCE 600
> #define _GNU_SOURCE
> 
> #include <unistd.h>
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
> #include <errno.h>
> #include <sys/fcntl.h>
> #include <sys/mman.h>
> #include <sys/wait.h>
> 
> #define BIGSIZE 128*1024*1024
> #define READSIZE 32*1024*1024
> #define WRITESIZE 32*1024*1024
> 
> int pagesize;
> char *iobuf;
> 
> void assert(const char *what, int assertion)
> {
> 	if (assertion)
> 		return;
> 	perror(what);
> 	exit(1);
> }
> 
> void do_buffered_writes(int fd, int pattern)
> {
> 	int rc;
> 	int offset;
> 	
> 	memset(iobuf, pattern, WRITESIZE);
> 	for (offset = 0; offset+WRITESIZE <= BIGSIZE; offset += WRITESIZE) {
> 		rc = pwrite(fd, iobuf, WRITESIZE, offset);
> 		assert("pwrite", rc >= 0);
> 		if (rc != WRITESIZE) {
> 			fprintf(stderr, "short write (%d out of %d)\n",
> 				rc, WRITESIZE);
> 			exit(1);
> 		}
> 		fsync(fd);
> 	}
> }
> 
> int do_direct_reads(char *filename)
> {
> 	int fd;
> 	int offset;
> 	int rc, i;
> 	int *p;
> 	
> 	fd = open(filename, O_DIRECT|O_RDONLY, 0);
> 	assert("open", fd >= 0);
> 
> 	for (offset = 0; offset+READSIZE <= BIGSIZE; offset += READSIZE) {
> 		rc = pread(fd, iobuf, READSIZE, offset);
> 		assert("pread", rc >= 0);
> 		if (rc != READSIZE) {
> 			fprintf(stderr, "short read (%d out of %d)\n",
> 				rc, READSIZE);
> 			exit(1);
> 		}
> 		for (i=0, p=(int *)iobuf; i<READSIZE; i+=4) {
> 			if (*p) {
> 				fprintf(stderr,
> 					"Found data (%08x) at offset %d+%d\n",
> 					*p, offset, i);
> 				return 1;
> 			}
> 			p++;
> 		}
> 	}
> 	return 0;
> }
> 
> int main(int argc, char *argv[])
> {
> 	char *filename;
> 	int fd;
> 	int pid;
> 	int err;
> 	int pass = 0;
> 	int bufsize;
> 	
> 	if (argc != 2) {
> 		fprintf(stderr, "Needs a filename as an argument.\n");
> 		exit(1);
> 	}
> 	
> 	filename = argv[1];
> 	
> 	pagesize = getpagesize();
> 	bufsize = READSIZE;
> 	if (WRITESIZE > READSIZE)
> 		bufsize = WRITESIZE;
> 	err = posix_memalign((void**) &iobuf, pagesize, bufsize);
> 	if (err) {
> 		fprintf(stderr, "Error allocating %d aligned bytes.\n", bufsize);
> 		exit(1);
> 	}
> 	
> 	fd = open(filename, O_CREAT|O_TRUNC|O_RDWR, 0666);
> 	assert("open", fd >= 0);
> 	
> 	do {
> 		printf("Pass %d...\n", ++pass);
> 		
> 		assert("ftruncate", ftruncate(fd, BIGSIZE) == 0);
> 		fsync(fd);
> 
> 		pid = fork();
> 		assert("fork", pid >= 0);
> 		
> 		if (!pid) {
> 			do_buffered_writes(fd, 0);
> 			exit(0);
> 		}
> 		
> 		err = do_direct_reads(filename);
> 
> 		wait4(pid, NULL, WNOHANG, 0);
> 		
> 		if (err) 
> 			break;
> 
> 		/* Fill the file with a known pattern so that the blocks
> 		 * on disk can be detected if they become exposed. */
> 		do_buffered_writes(fd, 1);
> 		fsync(fd);
> 
> 		assert("ftruncate", ftruncate(fd, 0) == 0);
> 		fsync(fd);
> 	} while (1);
> 	
> 	return err;
> }
> 

[-- Attachment #2: test10-mm1.dio.patch --]
[-- Type: text/plain, Size: 1185 bytes --]

--- linux-2.6.0-test10-mm1.ddm/fs/direct-io.c	2003-12-05 17:14:57.704920048 -0800
+++ linux-2.6.0-test10-mm1.ddm.DIO/fs/direct-io.c	2003-12-05 16:55:56.000000000 -0800
@@ -909,6 +909,7 @@ direct_io_worker(int rw, struct kiocb *i
 	int ret = 0;
 	int ret2;
 	size_t bytes;
+	loff_t i_size;
 
 	dio->bio = NULL;
 	dio->inode = inode;
@@ -1017,9 +1018,15 @@ direct_io_worker(int rw, struct kiocb *i
 	 * All block lookups have been performed. For READ requests
 	 * we can let i_sem go now that its achieved its purpose
 	 * of protecting us from looking up uninitialized blocks.
+	 * 
+	 * We also need sample i_size before we release i_sem to prevent
+	 * a racing write from changing i_size causing us to return
+	 * uninitialized data.
 	 */
-	if ((rw == READ) && dio->needs_locking)
+	if ((rw == READ) && dio->needs_locking) {
+		i_size = i_size_read(inode);
 		up(&dio->inode->i_sem);
+	}
 
 	/*
 	 * OK, all BIOs are submitted, so we can decrement bio_count to truly
@@ -1064,7 +1071,6 @@ direct_io_worker(int rw, struct kiocb *i
 		if (ret == 0)
 			ret = dio->page_errors;
 		if (ret == 0 && dio->result) {
-			loff_t i_size = i_size_read(inode);
 
 			ret = dio->result;
 			/*

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
  2003-12-06  1:29 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Daniel McNeil
@ 2003-12-08 18:23   ` Daniel McNeil
  2003-12-12  0:51     ` Daniel McNeil
  0 siblings, 1 reply; 58+ messages in thread
From: Daniel McNeil @ 2003-12-08 18:23 UTC (permalink / raw)
  To: Janet Morgan
  Cc: Suparna Bhattacharya, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

My patch did not fix the problem. :(

The tests over the weekend reported bad data.

The sampling of the i_size after dropping i_sem still looks wrong
to me.  I will keep looking and see if I can find another problem.


Daniel

On Fri, 2003-12-05 at 17:29, Daniel McNeil wrote:
> Janet,
> 
> I think I found the problem that is causing the dio reads to get
> unintialized data with buffer writes.  i_size is sampled after
> i_sem is dropped and after the i/o have completed.  i_size could
> have changed in the mean time.  This patch samples i_size before
> dropping i_sem.  I'll leave the test running over the weekend to
> verify.
> 
> Daniel
> 
> On Tue, 2003-12-02 at 18:33, Janet Morgan wrote:
> > Hi Suparna, Daniel,
> > 
> > Just wanted to let you know that I seem to be hitting a failure when I
> > run the testcase (attached) that Stephen Tweedie wrote to expose the DIO
> > races (the testcase issues buffered writes and dio reads).  I'm trying to
> > debug the failure, but I know it will be slow going for me.
> > 
> > I hit the problem when I run multiple instances of the test against
> > the same filesystem (I do specify a unique file for each instance of
> > the test).   I normally run about 6 instances of the test, e.g.,
> > "direct_read_under foo1", "direct_read_under foo2", etc.   The test
> > runs in an infinite loop until a data mis-compare is detected.
> > I have not been able to reproduce the failure when I restrict each
> > instance of the test to its own filesystem.
> > 
> > I've tried running with the combinations below and I eventually
> > get a failure in each case.   I assume all but the last combo includes
> > the critical aio/dio fixes for the associated base.
> > 
> > Combinations tested:
> > 
> >    Daniel's latest:
> >        2.6.0-test9-mm5 +
> >        aio-dio-fallback-bio_count-race.patch +
> >        direct-io-memleak-fix.patch
> > 
> >    Suparna's latest:
> >        2.6.0-test9-mm5 +
> >        Suparna's current patches:
> >            
> > http://marc.theaimsgroup.com/?l=linux-aio&m=106983304420570&w=2  +
> >            
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=106904658121299&w=2  +
> >        direct-io-memleak-fix.patch
> >  
> >     stock linux-2.6.0-test9 plus:
> >       aio-refcnt.patch  +
> >       dio-aio-fixes.patch +
> >       dio-aio-fixes-fixes.patch +
> >       O_DIRECT-race-fixes-rollup.patch 
> > 
> >     stock linux-2.6.0-test11 plus:
> >       O_DIRECT-race-fixes-rollup.patch
> >  
> > Thanks,
> > -Janet
> > 
> > ______________________________________________________________________
> > 
> > #define _XOPEN_SOURCE 600
> > #define _GNU_SOURCE
> > 
> > #include <unistd.h>
> > #include <stdlib.h>
> > #include <stdio.h>
> > #include <string.h>
> > #include <errno.h>
> > #include <sys/fcntl.h>
> > #include <sys/mman.h>
> > #include <sys/wait.h>
> > 
> > #define BIGSIZE 128*1024*1024
> > #define READSIZE 32*1024*1024
> > #define WRITESIZE 32*1024*1024
> > 
> > int pagesize;
> > char *iobuf;
> > 
> > void assert(const char *what, int assertion)
> > {
> > 	if (assertion)
> > 		return;
> > 	perror(what);
> > 	exit(1);
> > }
> > 
> > void do_buffered_writes(int fd, int pattern)
> > {
> > 	int rc;
> > 	int offset;
> > 	
> > 	memset(iobuf, pattern, WRITESIZE);
> > 	for (offset = 0; offset+WRITESIZE <= BIGSIZE; offset += WRITESIZE) {
> > 		rc = pwrite(fd, iobuf, WRITESIZE, offset);
> > 		assert("pwrite", rc >= 0);
> > 		if (rc != WRITESIZE) {
> > 			fprintf(stderr, "short write (%d out of %d)\n",
> > 				rc, WRITESIZE);
> > 			exit(1);
> > 		}
> > 		fsync(fd);
> > 	}
> > }
> > 
> > int do_direct_reads(char *filename)
> > {
> > 	int fd;
> > 	int offset;
> > 	int rc, i;
> > 	int *p;
> > 	
> > 	fd = open(filename, O_DIRECT|O_RDONLY, 0);
> > 	assert("open", fd >= 0);
> > 
> > 	for (offset = 0; offset+READSIZE <= BIGSIZE; offset += READSIZE) {
> > 		rc = pread(fd, iobuf, READSIZE, offset);
> > 		assert("pread", rc >= 0);
> > 		if (rc != READSIZE) {
> > 			fprintf(stderr, "short read (%d out of %d)\n",
> > 				rc, READSIZE);
> > 			exit(1);
> > 		}
> > 		for (i=0, p=(int *)iobuf; i<READSIZE; i+=4) {
> > 			if (*p) {
> > 				fprintf(stderr,
> > 					"Found data (%08x) at offset %d+%d\n",
> > 					*p, offset, i);
> > 				return 1;
> > 			}
> > 			p++;
> > 		}
> > 	}
> > 	return 0;
> > }
> > 
> > int main(int argc, char *argv[])
> > {
> > 	char *filename;
> > 	int fd;
> > 	int pid;
> > 	int err;
> > 	int pass = 0;
> > 	int bufsize;
> > 	
> > 	if (argc != 2) {
> > 		fprintf(stderr, "Needs a filename as an argument.\n");
> > 		exit(1);
> > 	}
> > 	
> > 	filename = argv[1];
> > 	
> > 	pagesize = getpagesize();
> > 	bufsize = READSIZE;
> > 	if (WRITESIZE > READSIZE)
> > 		bufsize = WRITESIZE;
> > 	err = posix_memalign((void**) &iobuf, pagesize, bufsize);
> > 	if (err) {
> > 		fprintf(stderr, "Error allocating %d aligned bytes.\n", bufsize);
> > 		exit(1);
> > 	}
> > 	
> > 	fd = open(filename, O_CREAT|O_TRUNC|O_RDWR, 0666);
> > 	assert("open", fd >= 0);
> > 	
> > 	do {
> > 		printf("Pass %d...\n", ++pass);
> > 		
> > 		assert("ftruncate", ftruncate(fd, BIGSIZE) == 0);
> > 		fsync(fd);
> > 
> > 		pid = fork();
> > 		assert("fork", pid >= 0);
> > 		
> > 		if (!pid) {
> > 			do_buffered_writes(fd, 0);
> > 			exit(0);
> > 		}
> > 		
> > 		err = do_direct_reads(filename);
> > 
> > 		wait4(pid, NULL, WNOHANG, 0);
> > 		
> > 		if (err) 
> > 			break;
> > 
> > 		/* Fill the file with a known pattern so that the blocks
> > 		 * on disk can be detected if they become exposed. */
> > 		do_buffered_writes(fd, 1);
> > 		fsync(fd);
> > 
> > 		assert("ftruncate", ftruncate(fd, 0) == 0);
> > 		fsync(fd);
> > 	} while (1);
> > 	
> > 	return err;
> > }
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
  2003-12-08 18:23   ` Daniel McNeil
@ 2003-12-12  0:51     ` Daniel McNeil
  2003-12-17  1:25       ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Daniel McNeil
  2003-12-30  4:53       ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
  0 siblings, 2 replies; 58+ messages in thread
From: Daniel McNeil @ 2003-12-12  0:51 UTC (permalink / raw)
  To: Janet Morgan
  Cc: Suparna Bhattacharya, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

I've done more testing with added debug code.

It looks like the filemap_write_and_wait() call is returning
with data that has not made it disk.

I added code to filemap_write_and_wait() to check if
mapping->dirty_pages is not empty after calling filemap_fdatawrite()
and filemap_fdatawait() and retry.  Even with the retry, the test still
sees uninitialized data when running tests overnight (I added a printk
so I know the retry is happening).  There are pages left on the
dirty_pages list even after the write and wait.   

I've added a bunch more debug code and am currently running the test
again to see if I can find out what is going on.

I'll send more email when I know more.

Daniel



On Mon, 2003-12-08 at 10:23, Daniel McNeil wrote:
> My patch did not fix the problem. :(
> 
> The tests over the weekend reported bad data.
> 
> The sampling of the i_size after dropping i_sem still looks wrong
> to me.  I will keep looking and see if I can find another problem.
> 
> 
> Daniel
> 
> On Fri, 2003-12-05 at 17:29, Daniel McNeil wrote:
> > Janet,
> > 
> > I think I found the problem that is causing the dio reads to get
> > unintialized data with buffer writes.  i_size is sampled after
> > i_sem is dropped and after the i/o have completed.  i_size could
> > have changed in the mean time.  This patch samples i_size before
> > dropping i_sem.  I'll leave the test running over the weekend to
> > verify.
> > 
> > Daniel
> > 
> > On Tue, 2003-12-02 at 18:33, Janet Morgan wrote:
> > > Hi Suparna, Daniel,
> > > 
> > > Just wanted to let you know that I seem to be hitting a failure when I
> > > run the testcase (attached) that Stephen Tweedie wrote to expose the DIO
> > > races (the testcase issues buffered writes and dio reads).  I'm trying to
> > > debug the failure, but I know it will be slow going for me.
> > > 
> > > I hit the problem when I run multiple instances of the test against
> > > the same filesystem (I do specify a unique file for each instance of
> > > the test).   I normally run about 6 instances of the test, e.g.,
> > > "direct_read_under foo1", "direct_read_under foo2", etc.   The test
> > > runs in an infinite loop until a data mis-compare is detected.
> > > I have not been able to reproduce the failure when I restrict each
> > > instance of the test to its own filesystem.
> > > 
> > > I've tried running with the combinations below and I eventually
> > > get a failure in each case.   I assume all but the last combo includes
> > > the critical aio/dio fixes for the associated base.
> > > 
> > > Combinations tested:
> > > 
> > >    Daniel's latest:
> > >        2.6.0-test9-mm5 +
> > >        aio-dio-fallback-bio_count-race.patch +
> > >        direct-io-memleak-fix.patch
> > > 
> > >    Suparna's latest:
> > >        2.6.0-test9-mm5 +
> > >        Suparna's current patches:
> > >            
> > > http://marc.theaimsgroup.com/?l=linux-aio&m=106983304420570&w=2  +
> > >            
> > > http://marc.theaimsgroup.com/?l=linux-kernel&m=106904658121299&w=2  +
> > >        direct-io-memleak-fix.patch
> > >  
> > >     stock linux-2.6.0-test9 plus:
> > >       aio-refcnt.patch  +
> > >       dio-aio-fixes.patch +
> > >       dio-aio-fixes-fixes.patch +
> > >       O_DIRECT-race-fixes-rollup.patch 
> > > 
> > >     stock linux-2.6.0-test11 plus:
> > >       O_DIRECT-race-fixes-rollup.patch
> > >  
> > > Thanks,
> > > -Janet
> > > 
> > > ______________________________________________________________________
> > > 
> > > #define _XOPEN_SOURCE 600
> > > #define _GNU_SOURCE
> > > 
> > > #include <unistd.h>
> > > #include <stdlib.h>
> > > #include <stdio.h>
> > > #include <string.h>
> > > #include <errno.h>
> > > #include <sys/fcntl.h>
> > > #include <sys/mman.h>
> > > #include <sys/wait.h>
> > > 
> > > #define BIGSIZE 128*1024*1024
> > > #define READSIZE 32*1024*1024
> > > #define WRITESIZE 32*1024*1024
> > > 
> > > int pagesize;
> > > char *iobuf;
> > > 
> > > void assert(const char *what, int assertion)
> > > {
> > > 	if (assertion)
> > > 		return;
> > > 	perror(what);
> > > 	exit(1);
> > > }
> > > 
> > > void do_buffered_writes(int fd, int pattern)
> > > {
> > > 	int rc;
> > > 	int offset;
> > > 	
> > > 	memset(iobuf, pattern, WRITESIZE);
> > > 	for (offset = 0; offset+WRITESIZE <= BIGSIZE; offset += WRITESIZE) {
> > > 		rc = pwrite(fd, iobuf, WRITESIZE, offset);
> > > 		assert("pwrite", rc >= 0);
> > > 		if (rc != WRITESIZE) {
> > > 			fprintf(stderr, "short write (%d out of %d)\n",
> > > 				rc, WRITESIZE);
> > > 			exit(1);
> > > 		}
> > > 		fsync(fd);
> > > 	}
> > > }
> > > 
> > > int do_direct_reads(char *filename)
> > > {
> > > 	int fd;
> > > 	int offset;
> > > 	int rc, i;
> > > 	int *p;
> > > 	
> > > 	fd = open(filename, O_DIRECT|O_RDONLY, 0);
> > > 	assert("open", fd >= 0);
> > > 
> > > 	for (offset = 0; offset+READSIZE <= BIGSIZE; offset += READSIZE) {
> > > 		rc = pread(fd, iobuf, READSIZE, offset);
> > > 		assert("pread", rc >= 0);
> > > 		if (rc != READSIZE) {
> > > 			fprintf(stderr, "short read (%d out of %d)\n",
> > > 				rc, READSIZE);
> > > 			exit(1);
> > > 		}
> > > 		for (i=0, p=(int *)iobuf; i<READSIZE; i+=4) {
> > > 			if (*p) {
> > > 				fprintf(stderr,
> > > 					"Found data (%08x) at offset %d+%d\n",
> > > 					*p, offset, i);
> > > 				return 1;
> > > 			}
> > > 			p++;
> > > 		}
> > > 	}
> > > 	return 0;
> > > }
> > > 
> > > int main(int argc, char *argv[])
> > > {
> > > 	char *filename;
> > > 	int fd;
> > > 	int pid;
> > > 	int err;
> > > 	int pass = 0;
> > > 	int bufsize;
> > > 	
> > > 	if (argc != 2) {
> > > 		fprintf(stderr, "Needs a filename as an argument.\n");
> > > 		exit(1);
> > > 	}
> > > 	
> > > 	filename = argv[1];
> > > 	
> > > 	pagesize = getpagesize();
> > > 	bufsize = READSIZE;
> > > 	if (WRITESIZE > READSIZE)
> > > 		bufsize = WRITESIZE;
> > > 	err = posix_memalign((void**) &iobuf, pagesize, bufsize);
> > > 	if (err) {
> > > 		fprintf(stderr, "Error allocating %d aligned bytes.\n", bufsize);
> > > 		exit(1);
> > > 	}
> > > 	
> > > 	fd = open(filename, O_CREAT|O_TRUNC|O_RDWR, 0666);
> > > 	assert("open", fd >= 0);
> > > 	
> > > 	do {
> > > 		printf("Pass %d...\n", ++pass);
> > > 		
> > > 		assert("ftruncate", ftruncate(fd, BIGSIZE) == 0);
> > > 		fsync(fd);
> > > 
> > > 		pid = fork();
> > > 		assert("fork", pid >= 0);
> > > 		
> > > 		if (!pid) {
> > > 			do_buffered_writes(fd, 0);
> > > 			exit(0);
> > > 		}
> > > 		
> > > 		err = do_direct_reads(filename);
> > > 
> > > 		wait4(pid, NULL, WNOHANG, 0);
> > > 		
> > > 		if (err) 
> > > 			break;
> > > 
> > > 		/* Fill the file with a known pattern so that the blocks
> > > 		 * on disk can be detected if they become exposed. */
> > > 		do_buffered_writes(fd, 1);
> > > 		fsync(fd);
> > > 
> > > 		assert("ftruncate", ftruncate(fd, 0) == 0);
> > > 		fsync(fd);
> > > 	} while (1);
> > > 	
> > > 	return err;
> > > }
> > > 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-12  0:51     ` Daniel McNeil
@ 2003-12-17  1:25       ` Daniel McNeil
  2003-12-17  2:03         ` Andrew Morton
  2003-12-30  4:53       ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
  1 sibling, 1 reply; 58+ messages in thread
From: Daniel McNeil @ 2003-12-17  1:25 UTC (permalink / raw)
  To: Janet Morgan
  Cc: Suparna Bhattacharya, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 8469 bytes --]

I have found something else that might be causing the problem.
filemap_fdatawait() skips pages that are not marked PG_writeback.
However, when a page is going to be written, PG_dirty is cleared
before PG_writeback is set (while the PG_locked is set).  So it
looks like filemap_fdatawait() can see a page just before it is
going to be written and not wait for it.  Here is a patch that
makes filemap_fdatawait() wait for locked pages as well to make
sure it does not missed any pages.

I'm running the test on kernel with this patch over night to see
if it fixes the problem.

Thoughts?

Daniel

On Thu, 2003-12-11 at 16:51, Daniel McNeil wrote:
> I've done more testing with added debug code.
> 
> It looks like the filemap_write_and_wait() call is returning
> with data that has not made it disk.
> 
> I added code to filemap_write_and_wait() to check if
> mapping->dirty_pages is not empty after calling filemap_fdatawrite()
> and filemap_fdatawait() and retry.  Even with the retry, the test still
> sees uninitialized data when running tests overnight (I added a printk
> so I know the retry is happening).  There are pages left on the
> dirty_pages list even after the write and wait.   
> 
> I've added a bunch more debug code and am currently running the test
> again to see if I can find out what is going on.
> 
> I'll send more email when I know more.
> 
> Daniel
> 
> 
> 
> On Mon, 2003-12-08 at 10:23, Daniel McNeil wrote:
> > My patch did not fix the problem. :(
> > 
> > The tests over the weekend reported bad data.
> > 
> > The sampling of the i_size after dropping i_sem still looks wrong
> > to me.  I will keep looking and see if I can find another problem.
> > 
> > 
> > Daniel
> > 
> > On Fri, 2003-12-05 at 17:29, Daniel McNeil wrote:
> > > Janet,
> > > 
> > > I think I found the problem that is causing the dio reads to get
> > > unintialized data with buffer writes.  i_size is sampled after
> > > i_sem is dropped and after the i/o have completed.  i_size could
> > > have changed in the mean time.  This patch samples i_size before
> > > dropping i_sem.  I'll leave the test running over the weekend to
> > > verify.
> > > 
> > > Daniel
> > > 
> > > On Tue, 2003-12-02 at 18:33, Janet Morgan wrote:
> > > > Hi Suparna, Daniel,
> > > > 
> > > > Just wanted to let you know that I seem to be hitting a failure when I
> > > > run the testcase (attached) that Stephen Tweedie wrote to expose the DIO
> > > > races (the testcase issues buffered writes and dio reads).  I'm trying to
> > > > debug the failure, but I know it will be slow going for me.
> > > > 
> > > > I hit the problem when I run multiple instances of the test against
> > > > the same filesystem (I do specify a unique file for each instance of
> > > > the test).   I normally run about 6 instances of the test, e.g.,
> > > > "direct_read_under foo1", "direct_read_under foo2", etc.   The test
> > > > runs in an infinite loop until a data mis-compare is detected.
> > > > I have not been able to reproduce the failure when I restrict each
> > > > instance of the test to its own filesystem.
> > > > 
> > > > I've tried running with the combinations below and I eventually
> > > > get a failure in each case.   I assume all but the last combo includes
> > > > the critical aio/dio fixes for the associated base.
> > > > 
> > > > Combinations tested:
> > > > 
> > > >    Daniel's latest:
> > > >        2.6.0-test9-mm5 +
> > > >        aio-dio-fallback-bio_count-race.patch +
> > > >        direct-io-memleak-fix.patch
> > > > 
> > > >    Suparna's latest:
> > > >        2.6.0-test9-mm5 +
> > > >        Suparna's current patches:
> > > >            
> > > > http://marc.theaimsgroup.com/?l=linux-aio&m=106983304420570&w=2  +
> > > >            
> > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=106904658121299&w=2  +
> > > >        direct-io-memleak-fix.patch
> > > >  
> > > >     stock linux-2.6.0-test9 plus:
> > > >       aio-refcnt.patch  +
> > > >       dio-aio-fixes.patch +
> > > >       dio-aio-fixes-fixes.patch +
> > > >       O_DIRECT-race-fixes-rollup.patch 
> > > > 
> > > >     stock linux-2.6.0-test11 plus:
> > > >       O_DIRECT-race-fixes-rollup.patch
> > > >  
> > > > Thanks,
> > > > -Janet
> > > > 
> > > > ______________________________________________________________________
> > > > 
> > > > #define _XOPEN_SOURCE 600
> > > > #define _GNU_SOURCE
> > > > 
> > > > #include <unistd.h>
> > > > #include <stdlib.h>
> > > > #include <stdio.h>
> > > > #include <string.h>
> > > > #include <errno.h>
> > > > #include <sys/fcntl.h>
> > > > #include <sys/mman.h>
> > > > #include <sys/wait.h>
> > > > 
> > > > #define BIGSIZE 128*1024*1024
> > > > #define READSIZE 32*1024*1024
> > > > #define WRITESIZE 32*1024*1024
> > > > 
> > > > int pagesize;
> > > > char *iobuf;
> > > > 
> > > > void assert(const char *what, int assertion)
> > > > {
> > > > 	if (assertion)
> > > > 		return;
> > > > 	perror(what);
> > > > 	exit(1);
> > > > }
> > > > 
> > > > void do_buffered_writes(int fd, int pattern)
> > > > {
> > > > 	int rc;
> > > > 	int offset;
> > > > 	
> > > > 	memset(iobuf, pattern, WRITESIZE);
> > > > 	for (offset = 0; offset+WRITESIZE <= BIGSIZE; offset += WRITESIZE) {
> > > > 		rc = pwrite(fd, iobuf, WRITESIZE, offset);
> > > > 		assert("pwrite", rc >= 0);
> > > > 		if (rc != WRITESIZE) {
> > > > 			fprintf(stderr, "short write (%d out of %d)\n",
> > > > 				rc, WRITESIZE);
> > > > 			exit(1);
> > > > 		}
> > > > 		fsync(fd);
> > > > 	}
> > > > }
> > > > 
> > > > int do_direct_reads(char *filename)
> > > > {
> > > > 	int fd;
> > > > 	int offset;
> > > > 	int rc, i;
> > > > 	int *p;
> > > > 	
> > > > 	fd = open(filename, O_DIRECT|O_RDONLY, 0);
> > > > 	assert("open", fd >= 0);
> > > > 
> > > > 	for (offset = 0; offset+READSIZE <= BIGSIZE; offset += READSIZE) {
> > > > 		rc = pread(fd, iobuf, READSIZE, offset);
> > > > 		assert("pread", rc >= 0);
> > > > 		if (rc != READSIZE) {
> > > > 			fprintf(stderr, "short read (%d out of %d)\n",
> > > > 				rc, READSIZE);
> > > > 			exit(1);
> > > > 		}
> > > > 		for (i=0, p=(int *)iobuf; i<READSIZE; i+=4) {
> > > > 			if (*p) {
> > > > 				fprintf(stderr,
> > > > 					"Found data (%08x) at offset %d+%d\n",
> > > > 					*p, offset, i);
> > > > 				return 1;
> > > > 			}
> > > > 			p++;
> > > > 		}
> > > > 	}
> > > > 	return 0;
> > > > }
> > > > 
> > > > int main(int argc, char *argv[])
> > > > {
> > > > 	char *filename;
> > > > 	int fd;
> > > > 	int pid;
> > > > 	int err;
> > > > 	int pass = 0;
> > > > 	int bufsize;
> > > > 	
> > > > 	if (argc != 2) {
> > > > 		fprintf(stderr, "Needs a filename as an argument.\n");
> > > > 		exit(1);
> > > > 	}
> > > > 	
> > > > 	filename = argv[1];
> > > > 	
> > > > 	pagesize = getpagesize();
> > > > 	bufsize = READSIZE;
> > > > 	if (WRITESIZE > READSIZE)
> > > > 		bufsize = WRITESIZE;
> > > > 	err = posix_memalign((void**) &iobuf, pagesize, bufsize);
> > > > 	if (err) {
> > > > 		fprintf(stderr, "Error allocating %d aligned bytes.\n", bufsize);
> > > > 		exit(1);
> > > > 	}
> > > > 	
> > > > 	fd = open(filename, O_CREAT|O_TRUNC|O_RDWR, 0666);
> > > > 	assert("open", fd >= 0);
> > > > 	
> > > > 	do {
> > > > 		printf("Pass %d...\n", ++pass);
> > > > 		
> > > > 		assert("ftruncate", ftruncate(fd, BIGSIZE) == 0);
> > > > 		fsync(fd);
> > > > 
> > > > 		pid = fork();
> > > > 		assert("fork", pid >= 0);
> > > > 		
> > > > 		if (!pid) {
> > > > 			do_buffered_writes(fd, 0);
> > > > 			exit(0);
> > > > 		}
> > > > 		
> > > > 		err = do_direct_reads(filename);
> > > > 
> > > > 		wait4(pid, NULL, WNOHANG, 0);
> > > > 		
> > > > 		if (err) 
> > > > 			break;
> > > > 
> > > > 		/* Fill the file with a known pattern so that the blocks
> > > > 		 * on disk can be detected if they become exposed. */
> > > > 		do_buffered_writes(fd, 1);
> > > > 		fsync(fd);
> > > > 
> > > > 		assert("ftruncate", ftruncate(fd, 0) == 0);
> > > > 		fsync(fd);
> > > > 	} while (1);
> > > > 	
> > > > 	return err;
> > > > }
> > > > 
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

[-- Attachment #2: linux-2.6.0-test10-mm1.fdatawait.patch --]
[-- Type: text/plain, Size: 839 bytes --]

--- linux-2.6.0-test10-mm1.ddm/mm/filemap.c	2003-12-05 17:14:57.000000000 -0800
+++ linux-2.6.0-test10-mm1.ddm.DIO/mm/filemap.c	2003-12-16 17:03:02.801786218 -0800
@@ -188,6 +188,20 @@ restart:
 		struct page *page;
 
 		page = list_entry(mapping->locked_pages.next,struct page,list);
+		/*
+		 * If the page is locked, it might be in process of being 
+		 * setup for writeback but without PG_writeback set 
+		 * and with PG_dirty cleared.
+		 * (PG_dirty is cleared BEFORE PG_writeback is set)
+		 * So, wait for the PG_locked to clear, then start over.
+		 */
+		if (PageLocked(page)) {
+			page_cache_get(page);
+			spin_unlock(&mapping->page_lock);
+			wait_on_page_locked(page);
+			page_cache_release(page);
+			goto restart;
+		}
 		list_del(&page->list);
 		if (PageDirty(page))
 			list_add(&page->list, &mapping->dirty_pages);

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-17  1:25       ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Daniel McNeil
@ 2003-12-17  2:03         ` Andrew Morton
  2003-12-17 19:25           ` Daniel McNeil
  2003-12-31  9:18           ` Suparna Bhattacharya
  0 siblings, 2 replies; 58+ messages in thread
From: Andrew Morton @ 2003-12-17  2:03 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: janetmor, suparna, pbadari, linux-aio, linux-kernel

Daniel McNeil <daniel@osdl.org> wrote:
>
> I have found something else that might be causing the problem.
> filemap_fdatawait() skips pages that are not marked PG_writeback.
> However, when a page is going to be written, PG_dirty is cleared
> before PG_writeback is set (while the PG_locked is set).  So it
> looks like filemap_fdatawait() can see a page just before it is
> going to be written and not wait for it.  Here is a patch that
> makes filemap_fdatawait() wait for locked pages as well to make
> sure it does not missed any pages.

This filemap_fdatawait() behaviour is as-designed.  That function is only
responsible for waiting on I/O which was initiated prior to it being
invoked.  Because it is designed for fsync(), fdatasync(), O_SYNC, msync(),
sync(), etc.

Now, it could be that this behaviour is not appropriate for the O_DIRECT
sync function - the result of your testing will be interesting.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-17  2:03         ` Andrew Morton
@ 2003-12-17 19:25           ` Daniel McNeil
  2003-12-17 20:17             ` Janet Morgan
  2003-12-31  9:18           ` Suparna Bhattacharya
  1 sibling, 1 reply; 58+ messages in thread
From: Daniel McNeil @ 2003-12-17 19:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: janetmor, Suparna Bhattacharya, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List

On Tue, 2003-12-16 at 18:03, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > I have found something else that might be causing the problem.
> > filemap_fdatawait() skips pages that are not marked PG_writeback.
> > However, when a page is going to be written, PG_dirty is cleared
> > before PG_writeback is set (while the PG_locked is set).  So it
> > looks like filemap_fdatawait() can see a page just before it is
> > going to be written and not wait for it.  Here is a patch that
> > makes filemap_fdatawait() wait for locked pages as well to make
> > sure it does not missed any pages.
> 
> This filemap_fdatawait() behaviour is as-designed.  That function is only
> responsible for waiting on I/O which was initiated prior to it being
> invoked.  Because it is designed for fsync(), fdatasync(), O_SYNC, msync(),
> sync(), etc.
> 
> Now, it could be that this behaviour is not appropriate for the O_DIRECT
> sync function - the result of your testing will be interesting.
> 

My tests still failed overnight.  I was thinking that maybe a
non-blocking do_writepages() was happening at the same time as
the filemap_fdatawrite()/filemap_fdatawait(), so even though the
page was dirty before the filemap_fdatawrite(), it was missed.

Daniel


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-17 19:25           ` Daniel McNeil
@ 2003-12-17 20:17             ` Janet Morgan
  0 siblings, 0 replies; 58+ messages in thread
From: Janet Morgan @ 2003-12-17 20:17 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Andrew Morton, Suparna Bhattacharya, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List

Daniel McNeil wrote:

>On Tue, 2003-12-16 at 18:03, Andrew Morton wrote:
>
>>Daniel McNeil <daniel@osdl.org> wrote:
>>
>>>I have found something else that might be causing the problem.
>>>filemap_fdatawait() skips pages that are not marked PG_writeback.
>>>However, when a page is going to be written, PG_dirty is cleared
>>>before PG_writeback is set (while the PG_locked is set).  So it
>>>looks like filemap_fdatawait() can see a page just before it is
>>>going to be written and not wait for it.  Here is a patch that
>>>makes filemap_fdatawait() wait for locked pages as well to make
>>>sure it does not missed any pages.
>>>
>>This filemap_fdatawait() behaviour is as-designed.  That function is only
>>responsible for waiting on I/O which was initiated prior to it being
>>invoked.  Because it is designed for fsync(), fdatasync(), O_SYNC, msync(),
>>sync(), etc.
>>
>>Now, it could be that this behaviour is not appropriate for the O_DIRECT
>>sync function - the result of your testing will be interesting.
>>
>
>My tests still failed overnight.  I was thinking that maybe a
>non-blocking do_writepages() was happening at the same time as
>the filemap_fdatawrite()/filemap_fdatawait(), so even though the
>page was dirty before the filemap_fdatawrite(), it was missed.
>
>Daniel
>
>
I'm wondering if processing in generic_file_direct_IO() shouldn't look 
more like
sys_fsync()?  When I add to generic_file_direct_IO() a call to 
f_op->fsync() between the calls to filemap_fdatawrite() and 
filemap_fdatawait(), the test Daniel and I have been running no longer 
fails for me.  This change would also seem consistent with 2.4, but I 
could be way off base.

-Janet


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
  2003-12-12  0:51     ` Daniel McNeil
  2003-12-17  1:25       ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Daniel McNeil
@ 2003-12-30  4:53       ` Suparna Bhattacharya
  2003-12-31  0:29         ` Daniel McNeil
  1 sibling, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2003-12-30  4:53 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Janet Morgan, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

On Thu, Dec 11, 2003 at 04:51:33PM -0800, Daniel McNeil wrote:
> I've done more testing with added debug code.
> 
> It looks like the filemap_write_and_wait() call is returning
> with data that has not made it disk.
> 
> I added code to filemap_write_and_wait() to check if
> mapping->dirty_pages is not empty after calling filemap_fdatawrite()
> and filemap_fdatawait() and retry.  Even with the retry, the test still
> sees uninitialized data when running tests overnight (I added a printk
> so I know the retry is happening).  There are pages left on the
> dirty_pages list even after the write and wait.   

There are two calls to filemap_write_and_wait() for a DIO read
-- do you know in which if these cases you see dirty pages after
the write and wait ? The first is called without i_sem protection,
so it is possible for pages to be dirtied by a parallel buffered
write which is OK because this is just an extra/superfluous call 
when it comes to DIO reads. The second call, however happens with i_sem 
held and is used to guarantee that there are no exposures, so if 
you are seeing remant dirty pages in this case it would be something 
to worry about.

Regards
Suparna

> 
> I've added a bunch more debug code and am currently running the test
> again to see if I can find out what is going on.
> 
> I'll send more email when I know more.
> 
> Daniel
> 
> 
> 
> On Mon, 2003-12-08 at 10:23, Daniel McNeil wrote:
> > My patch did not fix the problem. :(
> > 
> > The tests over the weekend reported bad data.
> > 
> > The sampling of the i_size after dropping i_sem still looks wrong
> > to me.  I will keep looking and see if I can find another problem.
> > 
> > 
> > Daniel
> > 
> > On Fri, 2003-12-05 at 17:29, Daniel McNeil wrote:
> > > Janet,
> > > 
> > > I think I found the problem that is causing the dio reads to get
> > > unintialized data with buffer writes.  i_size is sampled after
> > > i_sem is dropped and after the i/o have completed.  i_size could
> > > have changed in the mean time.  This patch samples i_size before
> > > dropping i_sem.  I'll leave the test running over the weekend to
> > > verify.
> > > 
> > > Daniel
> > > 
> > > On Tue, 2003-12-02 at 18:33, Janet Morgan wrote:
> > > > Hi Suparna, Daniel,
> > > > 
> > > > Just wanted to let you know that I seem to be hitting a failure when I
> > > > run the testcase (attached) that Stephen Tweedie wrote to expose the DIO
> > > > races (the testcase issues buffered writes and dio reads).  I'm trying to
> > > > debug the failure, but I know it will be slow going for me.
> > > > 
> > > > I hit the problem when I run multiple instances of the test against
> > > > the same filesystem (I do specify a unique file for each instance of
> > > > the test).   I normally run about 6 instances of the test, e.g.,
> > > > "direct_read_under foo1", "direct_read_under foo2", etc.   The test
> > > > runs in an infinite loop until a data mis-compare is detected.
> > > > I have not been able to reproduce the failure when I restrict each
> > > > instance of the test to its own filesystem.
> > > > 
> > > > I've tried running with the combinations below and I eventually
> > > > get a failure in each case.   I assume all but the last combo includes
> > > > the critical aio/dio fixes for the associated base.
> > > > 
> > > > Combinations tested:
> > > > 
> > > >    Daniel's latest:
> > > >        2.6.0-test9-mm5 +
> > > >        aio-dio-fallback-bio_count-race.patch +
> > > >        direct-io-memleak-fix.patch
> > > > 
> > > >    Suparna's latest:
> > > >        2.6.0-test9-mm5 +
> > > >        Suparna's current patches:
> > > >            
> > > > http://marc.theaimsgroup.com/?l=linux-aio&m=106983304420570&w=2  +
> > > >            
> > > > http://marc.theaimsgroup.com/?l=linux-kernel&m=106904658121299&w=2  +
> > > >        direct-io-memleak-fix.patch
> > > >  
> > > >     stock linux-2.6.0-test9 plus:
> > > >       aio-refcnt.patch  +
> > > >       dio-aio-fixes.patch +
> > > >       dio-aio-fixes-fixes.patch +
> > > >       O_DIRECT-race-fixes-rollup.patch 
> > > > 
> > > >     stock linux-2.6.0-test11 plus:
> > > >       O_DIRECT-race-fixes-rollup.patch
> > > >  
> > > > Thanks,
> > > > -Janet
> > > > 
> > > > ______________________________________________________________________
> > > > 
> > > > #define _XOPEN_SOURCE 600
> > > > #define _GNU_SOURCE
> > > > 
> > > > #include <unistd.h>
> > > > #include <stdlib.h>
> > > > #include <stdio.h>
> > > > #include <string.h>
> > > > #include <errno.h>
> > > > #include <sys/fcntl.h>
> > > > #include <sys/mman.h>
> > > > #include <sys/wait.h>
> > > > 
> > > > #define BIGSIZE 128*1024*1024
> > > > #define READSIZE 32*1024*1024
> > > > #define WRITESIZE 32*1024*1024
> > > > 
> > > > int pagesize;
> > > > char *iobuf;
> > > > 
> > > > void assert(const char *what, int assertion)
> > > > {
> > > > 	if (assertion)
> > > > 		return;
> > > > 	perror(what);
> > > > 	exit(1);
> > > > }
> > > > 
> > > > void do_buffered_writes(int fd, int pattern)
> > > > {
> > > > 	int rc;
> > > > 	int offset;
> > > > 	
> > > > 	memset(iobuf, pattern, WRITESIZE);
> > > > 	for (offset = 0; offset+WRITESIZE <= BIGSIZE; offset += WRITESIZE) {
> > > > 		rc = pwrite(fd, iobuf, WRITESIZE, offset);
> > > > 		assert("pwrite", rc >= 0);
> > > > 		if (rc != WRITESIZE) {
> > > > 			fprintf(stderr, "short write (%d out of %d)\n",
> > > > 				rc, WRITESIZE);
> > > > 			exit(1);
> > > > 		}
> > > > 		fsync(fd);
> > > > 	}
> > > > }
> > > > 
> > > > int do_direct_reads(char *filename)
> > > > {
> > > > 	int fd;
> > > > 	int offset;
> > > > 	int rc, i;
> > > > 	int *p;
> > > > 	
> > > > 	fd = open(filename, O_DIRECT|O_RDONLY, 0);
> > > > 	assert("open", fd >= 0);
> > > > 
> > > > 	for (offset = 0; offset+READSIZE <= BIGSIZE; offset += READSIZE) {
> > > > 		rc = pread(fd, iobuf, READSIZE, offset);
> > > > 		assert("pread", rc >= 0);
> > > > 		if (rc != READSIZE) {
> > > > 			fprintf(stderr, "short read (%d out of %d)\n",
> > > > 				rc, READSIZE);
> > > > 			exit(1);
> > > > 		}
> > > > 		for (i=0, p=(int *)iobuf; i<READSIZE; i+=4) {
> > > > 			if (*p) {
> > > > 				fprintf(stderr,
> > > > 					"Found data (%08x) at offset %d+%d\n",
> > > > 					*p, offset, i);
> > > > 				return 1;
> > > > 			}
> > > > 			p++;
> > > > 		}
> > > > 	}
> > > > 	return 0;
> > > > }
> > > > 
> > > > int main(int argc, char *argv[])
> > > > {
> > > > 	char *filename;
> > > > 	int fd;
> > > > 	int pid;
> > > > 	int err;
> > > > 	int pass = 0;
> > > > 	int bufsize;
> > > > 	
> > > > 	if (argc != 2) {
> > > > 		fprintf(stderr, "Needs a filename as an argument.\n");
> > > > 		exit(1);
> > > > 	}
> > > > 	
> > > > 	filename = argv[1];
> > > > 	
> > > > 	pagesize = getpagesize();
> > > > 	bufsize = READSIZE;
> > > > 	if (WRITESIZE > READSIZE)
> > > > 		bufsize = WRITESIZE;
> > > > 	err = posix_memalign((void**) &iobuf, pagesize, bufsize);
> > > > 	if (err) {
> > > > 		fprintf(stderr, "Error allocating %d aligned bytes.\n", bufsize);
> > > > 		exit(1);
> > > > 	}
> > > > 	
> > > > 	fd = open(filename, O_CREAT|O_TRUNC|O_RDWR, 0666);
> > > > 	assert("open", fd >= 0);
> > > > 	
> > > > 	do {
> > > > 		printf("Pass %d...\n", ++pass);
> > > > 		
> > > > 		assert("ftruncate", ftruncate(fd, BIGSIZE) == 0);
> > > > 		fsync(fd);
> > > > 
> > > > 		pid = fork();
> > > > 		assert("fork", pid >= 0);
> > > > 		
> > > > 		if (!pid) {
> > > > 			do_buffered_writes(fd, 0);
> > > > 			exit(0);
> > > > 		}
> > > > 		
> > > > 		err = do_direct_reads(filename);
> > > > 
> > > > 		wait4(pid, NULL, WNOHANG, 0);
> > > > 		
> > > > 		if (err) 
> > > > 			break;
> > > > 
> > > > 		/* Fill the file with a known pattern so that the blocks
> > > > 		 * on disk can be detected if they become exposed. */
> > > > 		do_buffered_writes(fd, 1);
> > > > 		fsync(fd);
> > > > 
> > > > 		assert("ftruncate", ftruncate(fd, 0) == 0);
> > > > 		fsync(fd);
> > > > 	} while (1);
> > > > 	
> > > > 	return err;
> > > > }
> > > > 
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
  2003-12-30  4:53       ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
@ 2003-12-31  0:29         ` Daniel McNeil
  2003-12-31  6:09           ` Suparna Bhattacharya
  2004-03-29 15:41           ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
  0 siblings, 2 replies; 58+ messages in thread
From: Daniel McNeil @ 2003-12-31  0:29 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Janet Morgan, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

On Mon, 2003-12-29 at 20:53, Suparna Bhattacharya wrote:
> On Thu, Dec 11, 2003 at 04:51:33PM -0800, Daniel McNeil wrote:
> > I've done more testing with added debug code.
> > 
> > It looks like the filemap_write_and_wait() call is returning
> > with data that has not made it disk.
> > 
> > I added code to filemap_write_and_wait() to check if
> > mapping->dirty_pages is not empty after calling filemap_fdatawrite()
> > and filemap_fdatawait() and retry.  Even with the retry, the test still
> > sees uninitialized data when running tests overnight (I added a printk
> > so I know the retry is happening).  There are pages left on the
> > dirty_pages list even after the write and wait.   
> 
> There are two calls to filemap_write_and_wait() for a DIO read
> -- do you know in which if these cases you see dirty pages after
> the write and wait ? The first is called without i_sem protection,
> so it is possible for pages to be dirtied by a parallel buffered
> write which is OK because this is just an extra/superfluous call 
> when it comes to DIO reads. The second call, however happens with i_sem 
> held and is used to guarantee that there are no exposures, so if 
> you are seeing remant dirty pages in this case it would be something 
> to worry about.
> 

Yes there are two calls.  I'm assuming that the retry was caused by the
call w/o holding the i_sem, since pages can be dirtied in parallel.

My debug output shows every filemap_write_and_wait() and which pages are
on the dirty, io, and locked list and which pages are on the clean list
after the wait. (yes, this is lots of output)

I changed the test a little bit -- if the test sees non-zero data, it
opens a new file descriptor and re-reads the data without O_DIRECT and
also re-reads the 1st page of the non-zero data using O_DIRECT.

What the debug output shows is:

The 1st filemap_write_and_wait() writes out some data.

The 2nd filemap_write_and_wait() sees different dirty pages.  The pages
  that are seen non-zero by the test (many pages -- 243 in one case) are
  on the dirty_pages list before the write and on the clean_pages list
  after the wait.  But some of the pages at the end of the clean list,
  are seen by the test program as non-zero.

Since I added the DIO re-read, the debug output shows for the 2nd
  re-read:

the 1st filemap_write_and_wait() with NO dirty pages
the 2nd filemap_write_and_wait() with dirty pages AFTER the pages
  that mis-matched and the original plus these pages on the clean
  list after the wait.

The 2nd re-read and the read without O_DIRECT both get zeroed data.

Conclusions:
===========

I'm not sure. :(

It appears like the pages are put on the clean list and PG_writeback is
cleared while the i/o is in flight and the DIO reads are put on the
queue ahead of the writeback.

I am trying to figure out how to add more debug code to prove this
or figure out what else is going on.

I'm open to suggestions.

Daniel


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
  2003-12-31  0:29         ` Daniel McNeil
@ 2003-12-31  6:09           ` Suparna Bhattacharya
  2004-01-08 23:55             ` Daniel McNeil
  2004-03-29 15:41           ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
  1 sibling, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2003-12-31  6:09 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Janet Morgan, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

On Tue, Dec 30, 2003 at 04:29:17PM -0800, Daniel McNeil wrote:
> On Mon, 2003-12-29 at 20:53, Suparna Bhattacharya wrote:
> > On Thu, Dec 11, 2003 at 04:51:33PM -0800, Daniel McNeil wrote:
> > > I've done more testing with added debug code.
> > > 
> > > It looks like the filemap_write_and_wait() call is returning
> > > with data that has not made it disk.
> > > 
> > > I added code to filemap_write_and_wait() to check if
> > > mapping->dirty_pages is not empty after calling filemap_fdatawrite()
> > > and filemap_fdatawait() and retry.  Even with the retry, the test still
> > > sees uninitialized data when running tests overnight (I added a printk
> > > so I know the retry is happening).  There are pages left on the
> > > dirty_pages list even after the write and wait.   
> > 
> > There are two calls to filemap_write_and_wait() for a DIO read
> > -- do you know in which if these cases you see dirty pages after
> > the write and wait ? The first is called without i_sem protection,
> > so it is possible for pages to be dirtied by a parallel buffered
> > write which is OK because this is just an extra/superfluous call 
> > when it comes to DIO reads. The second call, however happens with i_sem 
> > held and is used to guarantee that there are no exposures, so if 
> > you are seeing remant dirty pages in this case it would be something 
> > to worry about.
> > 
> 
> Yes there are two calls.  I'm assuming that the retry was caused by the
> call w/o holding the i_sem, since pages can be dirtied in parallel.
> 
> My debug output shows every filemap_write_and_wait() and which pages are
> on the dirty, io, and locked list and which pages are on the clean list
> after the wait. (yes, this is lots of output)
> 
> I changed the test a little bit -- if the test sees non-zero data, it
> opens a new file descriptor and re-reads the data without O_DIRECT and
> also re-reads the 1st page of the non-zero data using O_DIRECT.
> 
> What the debug output shows is:
> 
> The 1st filemap_write_and_wait() writes out some data.
> 
> The 2nd filemap_write_and_wait() sees different dirty pages.  The pages
>   that are seen non-zero by the test (many pages -- 243 in one case) are
>   on the dirty_pages list before the write and on the clean_pages list
>   after the wait.  But some of the pages at the end of the clean list,
>   are seen by the test program as non-zero.
> 
> Since I added the DIO re-read, the debug output shows for the 2nd
>   re-read:
> 
> the 1st filemap_write_and_wait() with NO dirty pages
> the 2nd filemap_write_and_wait() with dirty pages AFTER the pages
>   that mis-matched and the original plus these pages on the clean
>   list after the wait.
> 
> The 2nd re-read and the read without O_DIRECT both get zeroed data.
> 
> Conclusions:
> ===========
> 
> I'm not sure. :(
> 
> It appears like the pages are put on the clean list and PG_writeback is
> cleared while the i/o is in flight and the DIO reads are put on the
> queue ahead of the writeback.
> 
> I am trying to figure out how to add more debug code to prove this
> or figure out what else is going on.
> 
> I'm open to suggestions.

Since the first filemap_write_and_wait call is redundant and somewhat
suspect since its called w/o i_sem (I can think of unexpected side effects
on parallel filemap_write_and_wait calls), have you thought of disabling that
and then trying to see if you can still recreate the problem ? It may
not make a difference, but it seems like the right thing to do and could
at least simplify some of the debugging.

Regards
Suparna

> 
> Daniel
> 

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-17  2:03         ` Andrew Morton
  2003-12-17 19:25           ` Daniel McNeil
@ 2003-12-31  9:18           ` Suparna Bhattacharya
  2003-12-31  9:35             ` Andrew Morton
  1 sibling, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2003-12-31  9:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Daniel McNeil, janetmor, pbadari, linux-aio, linux-kernel

On Tue, Dec 16, 2003 at 06:03:19PM -0800, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > I have found something else that might be causing the problem.
> > filemap_fdatawait() skips pages that are not marked PG_writeback.
> > However, when a page is going to be written, PG_dirty is cleared
> > before PG_writeback is set (while the PG_locked is set).  So it
> > looks like filemap_fdatawait() can see a page just before it is
> > going to be written and not wait for it.  Here is a patch that
> > makes filemap_fdatawait() wait for locked pages as well to make
> > sure it does not missed any pages.
> 
> This filemap_fdatawait() behaviour is as-designed.  That function is only
> responsible for waiting on I/O which was initiated prior to it being
> invoked.  Because it is designed for fsync(), fdatasync(), O_SYNC, msync(),
> sync(), etc.

Hmm, the filemap_fdatawrite - fdatawait combination as a whole is responsible
for waiting for writes initiated prior to it being invoked to get to disk, 
isn't it ? 

It seems like the case Daniel is thinking about is when 
a process has issued writes to the page cache, and then filemap_fdatawrite
is called, while these pages are being written back to disk parallely by 
another thread (not holding i_sem). The filemap_fdatawrite wouldn't see
pages that are in the process of being written out by the background
thread so it doesn't mark them for writeback. The following filemap_fdatawait 
would find these pages on the locked_pages list all right, but if its
unlucky enough to be in the window that Daniel mentions where PG_dirty 
is cleared but PG_writeback hasn't yet been set, then the page would
have move to the clean list without waiting for the actual writeout
to complete !

So if this is a valid race ... then this could mean that fsync et al 
may potentially return before all prior writes have actually been 
sync'ed. In practice, possibly the sync'ing of metadata ends up waiting 
for i/o that might be ordered later in the queue, and hence effectively 
sees the data i/o completed as well. 

> 
> Now, it could be that this behaviour is not appropriate for the O_DIRECT
> sync function - the result of your testing will be interesting.
> 
> 

The above reasoning applies not just to DIO but regular buffered i/o - fsync
kind of situations as well.

But since Daniel's tests failed even with the patch, maybe this isn't
the cause for the latest DIO exposures. Though that doesn't mean the race
doesn't exist.

I guess it would help to see what happens without the first (non i_sem 
protected) filemap_write_and_wait in DIO.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31  9:18           ` Suparna Bhattacharya
@ 2003-12-31  9:35             ` Andrew Morton
  2003-12-31  9:55               ` Suparna Bhattacharya
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2003-12-31  9:35 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Suparna Bhattacharya <suparna@in.ibm.com> wrote:
>
> It seems like the case Daniel is thinking about is when 
> a process has issued writes to the page cache, and then filemap_fdatawrite
> is called, while these pages are being written back to disk parallely by 
> another thread (not holding i_sem). The filemap_fdatawrite wouldn't see
> pages that are in the process of being written out by the background
> thread so it doesn't mark them for writeback.

If those pages are under writeout then they are clean and
filemap_fdatawrite() has nothing to do.

If, however, those pages were redirtied while under I/O then they are
dirty, on dirty_pages and are under writeout.  In that case
filemap_fdatawrite() must wait for the current write to complete and must
start a new write.

> The following filemap_fdatawait 
> would find these pages on the locked_pages list all right, but if its
> unlucky enough to be in the window that Daniel mentions where PG_dirty 
> is cleared but PG_writeback hasn't yet been set, then the page would
> have move to the clean list without waiting for the actual writeout
> to complete !

If you are referring to this code in mpage_writepage():

		lock_page(page);

		if (wbc->sync_mode != WB_SYNC_NONE)
			wait_on_page_writeback(page);

		if (page->mapping == mapping && !PageWriteback(page) &&
					test_clear_page_dirty(page)) {


then I don't see the race - the page lock synchronises the two threads?



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31  9:35             ` Andrew Morton
@ 2003-12-31  9:55               ` Suparna Bhattacharya
  2003-12-31  9:59                 ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2003-12-31  9:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

On Wed, Dec 31, 2003 at 01:35:21AM -0800, Andrew Morton wrote:
> Suparna Bhattacharya <suparna@in.ibm.com> wrote:
> >
> > It seems like the case Daniel is thinking about is when 
> > a process has issued writes to the page cache, and then filemap_fdatawrite
> > is called, while these pages are being written back to disk parallely by 
> > another thread (not holding i_sem). The filemap_fdatawrite wouldn't see
> > pages that are in the process of being written out by the background
> > thread so it doesn't mark them for writeback.
> 
> If those pages are under writeout then they are clean and
> filemap_fdatawrite() has nothing to do.

Sure, it doesn't, because the other thread is about to issue
a writeout on them. But it does mean that there is no guarantee
that those pages are already marked as writeback when 
filemap_fdatawrite returns and filemap_fdatawait is called.

> 
> If, however, those pages were redirtied while under I/O then they are
> dirty, on dirty_pages and are under writeout.  In that case
> filemap_fdatawrite() must wait for the current write to complete and must
> start a new write.
> 
> > The following filemap_fdatawait 
> > would find these pages on the locked_pages list all right, but if its
> > unlucky enough to be in the window that Daniel mentions where PG_dirty 
> > is cleared but PG_writeback hasn't yet been set, then the page would
> > have move to the clean list without waiting for the actual writeout
> > to complete !
> 
> If you are referring to this code in mpage_writepage():
> 
> 		lock_page(page);
> 
> 		if (wbc->sync_mode != WB_SYNC_NONE)
> 			wait_on_page_writeback(page);
> 
> 		if (page->mapping == mapping && !PageWriteback(page) &&
> 					test_clear_page_dirty(page)) {
> 
> 
> then I don't see the race - the page lock synchronises the two threads?
> 

But filemap_fdatawait does not look at the page lock. So there's a
tiny window when the page is on locked_pages with PG_dirty cleared
and PG_writeback not set. Does that make sense, or is there something
I overlook ?

Daniel's patch precisely tried to fill that gap - he added a check for page 
lock in filemap_datawait.

Regards
Suparna

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31  9:55               ` Suparna Bhattacharya
@ 2003-12-31  9:59                 ` Andrew Morton
  2003-12-31 10:09                   ` Suparna Bhattacharya
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2003-12-31  9:59 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Suparna Bhattacharya <suparna@in.ibm.com> wrote:
>
> > If you are referring to this code in mpage_writepage():
>  > 
>  > 		lock_page(page);
>  > 
>  > 		if (wbc->sync_mode != WB_SYNC_NONE)
>  > 			wait_on_page_writeback(page);
>  > 
>  > 		if (page->mapping == mapping && !PageWriteback(page) &&
>  > 					test_clear_page_dirty(page)) {
>  > 
>  > 
>  > then I don't see the race - the page lock synchronises the two threads?
>  > 
> 
>  But filemap_fdatawait does not look at the page lock. So there's a
>  tiny window when the page is on locked_pages with PG_dirty cleared
>  and PG_writeback not set.

OK, but the thread which is running fdatawrite/fdatawait isn't interested
in that page, because it must have been dirtied _after_ this thread has
passed through filemap_fdatawrite(), yes?

Which is desired behaviour for fsync(), but perhaps not suitable when this
code is reused for the O_DIRECT pagecache synchronisation function.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31  9:59                 ` Andrew Morton
@ 2003-12-31 10:09                   ` Suparna Bhattacharya
  2003-12-31 10:10                     ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2003-12-31 10:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

On Wed, Dec 31, 2003 at 01:59:13AM -0800, Andrew Morton wrote:
> Suparna Bhattacharya <suparna@in.ibm.com> wrote:
> >
> > > If you are referring to this code in mpage_writepage():
> >  > 
> >  > 		lock_page(page);
> >  > 
> >  > 		if (wbc->sync_mode != WB_SYNC_NONE)
> >  > 			wait_on_page_writeback(page);
> >  > 
> >  > 		if (page->mapping == mapping && !PageWriteback(page) &&
> >  > 					test_clear_page_dirty(page)) {
> >  > 
> >  > 
> >  > then I don't see the race - the page lock synchronises the two threads?
> >  > 
> > 
> >  But filemap_fdatawait does not look at the page lock. So there's a
> >  tiny window when the page is on locked_pages with PG_dirty cleared
> >  and PG_writeback not set.
> 
> OK, but the thread which is running fdatawrite/fdatawait isn't interested
> in that page, because it must have been dirtied _after_ this thread has
> passed through filemap_fdatawrite(), yes?

Not exactly. The page could actually have been dirtied _before_ this 
thread passes through filemap_datawrite, but is just being parallely 
written back by a background thread.

Regards
Suparna

> 
> Which is desired behaviour for fsync(), but perhaps not suitable when this
> code is reused for the O_DIRECT pagecache synchronisation function.
> 

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31 10:09                   ` Suparna Bhattacharya
@ 2003-12-31 10:10                     ` Andrew Morton
  2003-12-31 10:48                       ` Suparna Bhattacharya
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2003-12-31 10:10 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Suparna Bhattacharya <suparna@in.ibm.com> wrote:
>
>  > OK, but the thread which is running fdatawrite/fdatawait isn't interested
>  > in that page, because it must have been dirtied _after_ this thread has
>  > passed through filemap_fdatawrite(), yes?
> 
>  Not exactly. The page could actually have been dirtied _before_ this 
>  thread passes through filemap_datawrite, but is just being parallely 
>  written back by a background thread.

I still don't see it.  If the page was dirtied before this thread entered
filemap_fdatawrite() then this thread will either start writeout
immediately, or will wait on the writeout and start new writeout.  And
the page lock avoids the timing window which you have mentioned.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31 10:10                     ` Andrew Morton
@ 2003-12-31 10:48                       ` Suparna Bhattacharya
  2003-12-31 10:53                         ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2003-12-31 10:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

On Wed, Dec 31, 2003 at 02:10:42AM -0800, Andrew Morton wrote:
> Suparna Bhattacharya <suparna@in.ibm.com> wrote:
> >
> >  > OK, but the thread which is running fdatawrite/fdatawait isn't interested
> >  > in that page, because it must have been dirtied _after_ this thread has
> >  > passed through filemap_fdatawrite(), yes?
> > 
> >  Not exactly. The page could actually have been dirtied _before_ this 
> >  thread passes through filemap_datawrite, but is just being parallely 
> >  written back by a background thread.
> 
> I still don't see it.  If the page was dirtied before this thread entered
> filemap_fdatawrite() then this thread will either start writeout
> immediately, or will wait on the writeout and start new writeout.  And
> the page lock avoids the timing window which you have mentioned.

>From filemap_fdatawait:

while (!list_empty(&mapping->locked_pages)) {
                struct page *page;
                                                                                
                page = list_entry(mapping->locked_pages.next,struct page,list);
                list_del(&page->list);
                if (PageDirty(page))
                        list_add(&page->list, &mapping->dirty_pages);
                else
                        list_add(&page->list, &mapping->clean_pages);
                                                                                
---->           if (!PageWriteback(page)) {
		                        if (++progress > 32) {
                                if (need_resched()) {
                                        spin_unlock(&mapping->page_lock);
                                        __cond_resched();
                                        goto restart;
                                }
                        }
---->                  continue;
                }
	
If PG_writeback is not set, it moves on to process the next page -- without
falling through to wait_on_page_writeback, isn't that so ? 
Another thread could have pulled the page off io_pages, put it on locked
pages, and not set PG_writeback as yet.

How does the page lock guard against this ? 
filemap_fdatawait seems to ignore the page lock.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31 10:48                       ` Suparna Bhattacharya
@ 2003-12-31 10:53                         ` Andrew Morton
  2003-12-31 10:54                           ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2003-12-31 10:53 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Suparna Bhattacharya <suparna@in.ibm.com> wrote:
>
> If PG_writeback is not set, it moves on to process the next page -- without
>  falling through to wait_on_page_writeback, isn't that so ? 
>  Another thread could have pulled the page off io_pages, put it on locked
>  pages, and not set PG_writeback as yet.

OK, so a (well commented!) lock_page() in filemap_fdatawait() would seem to
plug that.  I recall being worried that this might be livelockable against
ongoing read activity, but that doesn't sound right - such pages won't be
on locked_pages().


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31 10:53                         ` Andrew Morton
@ 2003-12-31 10:54                           ` Andrew Morton
  2003-12-31 11:17                             ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2003-12-31 10:54 UTC (permalink / raw)
  To: suparna, daniel, janetmor, pbadari, linux-aio, linux-kernel

Andrew Morton <akpm@osdl.org> wrote:
>
> OK, so a (well commented!) lock_page() in filemap_fdatawait() would seem to
>  plug that.

No it won't.  Let me actually think about this a bit.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31 10:54                           ` Andrew Morton
@ 2003-12-31 11:17                             ` Andrew Morton
  2003-12-31 22:34                               ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
                                                 ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Andrew Morton @ 2003-12-31 11:17 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Andrew Morton <akpm@osdl.org> wrote:
>
> Let me actually think about this a bit.

Nasty.  The same race is present in 2.4.x...

How's about we start new I/O in filemap_fdatawait() if the page is dirty?


diff -puN mm/filemap.c~a mm/filemap.c
--- 25/mm/filemap.c~a	2003-12-31 03:10:29.000000000 -0800
+++ 25-akpm/mm/filemap.c	2003-12-31 03:17:05.000000000 -0800
@@ -206,7 +206,13 @@ restart:
 		page_cache_get(page);
 		spin_unlock(&mapping->page_lock);
 
-		wait_on_page_writeback(page);
+		lock_page(page);
+		if (PageDirty(page) && mapping->a_ops->writepage) {
+			write_one_page(page, 1);
+		} else {
+			wait_on_page_writeback(page);
+			unlock_page(page);
+		}
 		if (PageError(page))
 			ret = -EIO;
 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch
  2003-12-31 11:17                             ` Andrew Morton
@ 2003-12-31 22:34                               ` Daniel McNeil
  2003-12-31 22:41                                 ` [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch Daniel McNeil
                                                   ` (2 more replies)
  2004-01-02  5:50                               ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
  2004-01-11 23:14                               ` Janet Morgan
  2 siblings, 3 replies; 58+ messages in thread
From: Daniel McNeil @ 2003-12-31 22:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Suparna Bhattacharya, janetmor, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

The other potential race in filemap_fdatawait() was that it
removed the page from the locked list while waiting for a writeback
and if there was a 2nd filemap_fdatawait() running on another cpu,
it would not wait for the page being written since it would never see
it on the list.

I've been running with filemap_fdatawait() that waits for the
locked page and writeback with the page still on the locked list.
Any other filemap_fdatawait() would see the page being written
and then wait.

I've re-diffed this patch against 2.6.1-rc1-mm1.

Thoughts?

Daniel



On Wed, 2003-12-31 at 03:17, Andrew Morton wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> >
> > Let me actually think about this a bit.
> 
> Nasty.  The same race is present in 2.4.x...
> 
> How's about we start new I/O in filemap_fdatawait() if the page is dirty?
> 
> 
> diff -puN mm/filemap.c~a mm/filemap.c
> --- 25/mm/filemap.c~a	2003-12-31 03:10:29.000000000 -0800
> +++ 25-akpm/mm/filemap.c	2003-12-31 03:17:05.000000000 -0800
> @@ -206,7 +206,13 @@ restart:
>  		page_cache_get(page);
>  		spin_unlock(&mapping->page_lock);
>  
> -		wait_on_page_writeback(page);
> +		lock_page(page);
> +		if (PageDirty(page) && mapping->a_ops->writepage) {
> +			write_one_page(page, 1);
> +		} else {
> +			wait_on_page_writeback(page);
> +			unlock_page(page);
> +		}
>  		if (PageError(page))
>  			ret = -EIO;
>  
> 

[-- Attachment #2: linux-2.6.1-rc1-mm1.fdatawait.patch --]
[-- Type: text/plain, Size: 1131 bytes --]

--- linux-2.6.1-rc1-mm1/mm/filemap.c	2003-12-31 10:52:47.039988554 -0800
+++ linux-2.6.1-rc1-mm1.fdatawait/mm/filemap.c	2003-12-31 14:10:43.114197926 -0800
@@ -188,6 +188,32 @@ restart:
 		struct page *page;
 
 		page = list_entry(mapping->locked_pages.next,struct page,list);
+		/*
+		 * If the page is locked, it might be in process of being 
+		 * setup for writeback but without PG_writeback set 
+		 * and with PG_dirty cleared.
+		 * (PG_dirty is cleared BEFORE PG_writeback is set)
+		 * So, wait for the PG_locked to clear, then start over.
+		 */
+		if (PageLocked(page)) {
+			page_cache_get(page);
+			spin_unlock(&mapping->page_lock);
+			wait_on_page_locked(page);
+			page_cache_release(page);
+			goto restart;
+		}
+		/*
+		 * Wait for page writeback with it still on the locked list.
+		 */
+		if (PageWriteback(page)) {
+			page_cache_get(page);
+			spin_unlock(&mapping->page_lock);
+			wait_on_page_writeback(page);
+			if (PageError(page))
+				ret = -EIO;
+			page_cache_release(page);
+			goto restart;
+		}
 		list_del(&page->list);
 		if (PageDirty(page))
 			list_add(&page->list, &mapping->dirty_pages);

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2003-12-31 22:34                               ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
@ 2003-12-31 22:41                                 ` Daniel McNeil
  2003-12-31 23:46                                   ` Andrew Morton
  2003-12-31 22:47                                 ` [PATCH linux-2.6.1-rc1-mm1] dio_isize.patch Daniel McNeil
  2003-12-31 23:42                                 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Andrew Morton
  2 siblings, 1 reply; 58+ messages in thread
From: Daniel McNeil @ 2003-12-31 22:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Suparna Bhattacharya, janetmor, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 521 bytes --]

This is an update of AIO fallback patch and with the bio_count race
fixed by changing bio_list_lock into bio_lock and using that for all
the bio fields.  I changed bio_count and bios_in_flight from atomics
into int.  They are now proctected by the bio_lock.  I fixed the race,
by in finished_one_bio() by leaving the bio_count at 1 until after the
dio_complete() and then do the bio_count decrement and wakeup holding
the bio_lock.

I re-diffed to 2.6.1-rc1-mm1.

I've been stressing this code for several weeks.

Daniel

[-- Attachment #2: linux-2.6.1-rc1-mm1.aiodio_fallback_bio_count.patch --]
[-- Type: text/x-patch, Size: 9072 bytes --]

diff -rupN -X /home/daniel/dontdiff linux-2.6.1-rc1-mm1/fs/direct-io.c linux-2.6.1-rc1-mm1.ddm/fs/direct-io.c
--- linux-2.6.1-rc1-mm1/fs/direct-io.c	2003-12-31 10:52:45.940193469 -0800
+++ linux-2.6.1-rc1-mm1.ddm/fs/direct-io.c	2003-12-31 13:29:51.260432002 -0800
@@ -74,6 +74,7 @@ struct dio {
 					   been performed at the start of a
 					   write */
 	int pages_in_io;		/* approximate total IO pages */
+	size_t	size;			/* total request size (doesn't change)*/
 	sector_t block_in_file;		/* Current offset into the underlying
 					   file in dio_block units. */
 	unsigned blocks_available;	/* At block_in_file.  changes */
@@ -115,9 +116,9 @@ struct dio {
 	int page_errors;		/* errno from get_user_pages() */
 
 	/* BIO completion state */
-	atomic_t bio_count;		/* nr bios to be completed */
-	atomic_t bios_in_flight;	/* nr bios in flight */
-	spinlock_t bio_list_lock;	/* protects bio_list */
+	spinlock_t bio_lock;		/* protects BIO fields below */
+	int bio_count;			/* nr bios to be completed */
+	int bios_in_flight;		/* nr bios in flight */
 	struct bio *bio_list;		/* singly linked via bi_private */
 	struct task_struct *waiter;	/* waiting task (NULL if none) */
 
@@ -221,20 +222,38 @@ static void dio_complete(struct dio *dio
  */
 static void finished_one_bio(struct dio *dio)
 {
-	if (atomic_dec_and_test(&dio->bio_count)) {
+	unsigned long flags;
+
+	spin_lock_irqsave(&dio->bio_lock, flags);
+	if (dio->bio_count == 1) {
 		if (dio->is_async) {
+			/*
+			 * Last reference to the dio is going away.
+			 * Drop spinlock and complete the DIO.
+			 */
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			dio_complete(dio, dio->block_in_file << dio->blkbits,
 					dio->result);
 			/* Complete AIO later if falling back to buffered i/o */
-			if (dio->result != -ENOTBLK) {
+			if (dio->result >= dio->size || dio->rw == READ) {
 				aio_complete(dio->iocb, dio->result, 0);
 				kfree(dio);
+				return;
 			} else {
+				/*
+				 * Falling back to buffered
+				 */
+				spin_lock_irqsave(&dio->bio_lock, flags);
+				dio->bio_count--;
 				if (dio->waiter)
 					wake_up_process(dio->waiter);
+				spin_unlock_irqrestore(&dio->bio_lock, flags);
+				return;
 			}
 		}
 	}
+	dio->bio_count--;
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 }
 
 static int dio_bio_complete(struct dio *dio, struct bio *bio);
@@ -268,13 +287,13 @@ static int dio_bio_end_io(struct bio *bi
 	if (bio->bi_size)
 		return 1;
 
-	spin_lock_irqsave(&dio->bio_list_lock, flags);
+	spin_lock_irqsave(&dio->bio_lock, flags);
 	bio->bi_private = dio->bio_list;
 	dio->bio_list = bio;
-	atomic_dec(&dio->bios_in_flight);
-	if (dio->waiter && atomic_read(&dio->bios_in_flight) == 0)
+	dio->bios_in_flight--;
+	if (dio->waiter && dio->bios_in_flight == 0)
 		wake_up_process(dio->waiter);
-	spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 	return 0;
 }
 
@@ -307,10 +326,13 @@ dio_bio_alloc(struct dio *dio, struct bl
 static void dio_bio_submit(struct dio *dio)
 {
 	struct bio *bio = dio->bio;
+	unsigned long flags;
 
 	bio->bi_private = dio;
-	atomic_inc(&dio->bio_count);
-	atomic_inc(&dio->bios_in_flight);
+	spin_lock_irqsave(&dio->bio_lock, flags);
+	dio->bio_count++;
+	dio->bios_in_flight++;
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 	if (dio->is_async && dio->rw == READ)
 		bio_set_pages_dirty(bio);
 	submit_bio(dio->rw, bio);
@@ -336,22 +358,22 @@ static struct bio *dio_await_one(struct 
 	unsigned long flags;
 	struct bio *bio;
 
-	spin_lock_irqsave(&dio->bio_list_lock, flags);
+	spin_lock_irqsave(&dio->bio_lock, flags);
 	while (dio->bio_list == NULL) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (dio->bio_list == NULL) {
 			dio->waiter = current;
-			spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			blk_run_queues();
 			io_schedule();
-			spin_lock_irqsave(&dio->bio_list_lock, flags);
+			spin_lock_irqsave(&dio->bio_lock, flags);
 			dio->waiter = NULL;
 		}
 		set_current_state(TASK_RUNNING);
 	}
 	bio = dio->bio_list;
 	dio->bio_list = bio->bi_private;
-	spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 	return bio;
 }
 
@@ -393,7 +415,12 @@ static int dio_await_completion(struct d
 	if (dio->bio)
 		dio_bio_submit(dio);
 
-	while (atomic_read(&dio->bio_count)) {
+	/*
+	 * The bio_lock is not held for the read of bio_count.
+	 * This is ok since it is the dio_bio_complete() that changes
+	 * bio_count.
+	 */
+	while (dio->bio_count) {
 		struct bio *bio = dio_await_one(dio);
 		int ret2;
 
@@ -420,10 +447,10 @@ static int dio_bio_reap(struct dio *dio)
 			unsigned long flags;
 			struct bio *bio;
 
-			spin_lock_irqsave(&dio->bio_list_lock, flags);
+			spin_lock_irqsave(&dio->bio_lock, flags);
 			bio = dio->bio_list;
 			dio->bio_list = bio->bi_private;
-			spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			ret = dio_bio_complete(dio, bio);
 		}
 		dio->reap_counter = 0;
@@ -889,6 +916,7 @@ direct_io_worker(int rw, struct kiocb *i
 	dio->blkbits = blkbits;
 	dio->blkfactor = inode->i_blkbits - blkbits;
 	dio->start_zero_done = 0;
+	dio->size = 0;
 	dio->block_in_file = offset >> blkbits;
 	dio->blocks_available = 0;
 	dio->cur_page = NULL;
@@ -913,9 +941,9 @@ direct_io_worker(int rw, struct kiocb *i
 	 * (or synchronous) device could take the count to zero while we're
 	 * still submitting BIOs.
 	 */
-	atomic_set(&dio->bio_count, 1);
-	atomic_set(&dio->bios_in_flight, 0);
-	spin_lock_init(&dio->bio_list_lock);
+	dio->bio_count = 1;
+	dio->bios_in_flight = 0;
+	spin_lock_init(&dio->bio_lock);
 	dio->bio_list = NULL;
 	dio->waiter = NULL;
 
@@ -925,7 +953,7 @@ direct_io_worker(int rw, struct kiocb *i
 
 	for (seg = 0; seg < nr_segs; seg++) {
 		user_addr = (unsigned long)iov[seg].iov_base;
-		bytes = iov[seg].iov_len;
+		dio->size += bytes = iov[seg].iov_len;
 
 		/* Index into the first page of the first block */
 		dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits;
@@ -956,6 +984,13 @@ direct_io_worker(int rw, struct kiocb *i
 		}
 	} /* end iovec loop */
 
+	if (ret == -ENOTBLK && rw == WRITE) {
+		/*
+		 * The remaining part of the request will be 
+		 * be handled by buffered I/O when we return
+		 */
+		ret = 0;
+	}
 	/*
 	 * There may be some unwritten disk at the end of a part-written
 	 * fs-block-sized block.  Go zero that now.
@@ -991,32 +1026,35 @@ direct_io_worker(int rw, struct kiocb *i
 	 * reflect the number of to-be-processed BIOs.
 	 */
 	if (dio->is_async) {
-		if (ret == 0)
-			ret = dio->result;	/* Bytes written */
-		if (ret == -ENOTBLK) {
-			/*
-			 * The request will be reissued via buffered I/O
-			 * when we return; Any I/O already issued
-			 * effectively becomes redundant.
-			 */
-			dio->result = ret;
+		int should_wait = 0;
+		
+		if (dio->result < dio->size && rw == WRITE) {
 			dio->waiter = current;
+			should_wait = 1;
 		}
+		if (ret == 0)
+			ret = dio->result;
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
-		if (ret == -ENOTBLK) {
+		if (should_wait) {
+			unsigned long flags;
 			/*
 			 * Wait for already issued I/O to drain out and
 			 * release its references to user-space pages
 			 * before returning to fallback on buffered I/O
 			 */
+
+			spin_lock_irqsave(&dio->bio_lock, flags);
 			set_current_state(TASK_UNINTERRUPTIBLE);
-			while (atomic_read(&dio->bio_count)) {
+			while (dio->bio_count) {
+				spin_unlock_irqrestore(&dio->bio_lock, flags);
 				io_schedule();
+				spin_lock_irqsave(&dio->bio_lock, flags);
 				set_current_state(TASK_UNINTERRUPTIBLE);
 			}
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			set_current_state(TASK_RUNNING);
-			dio->waiter = NULL;
+			kfree(dio);
 		}
 	} else {
 		finished_one_bio(dio);
@@ -1038,7 +1076,8 @@ direct_io_worker(int rw, struct kiocb *i
 		}
 		dio_complete(dio, offset, ret);
 		/* We could have also come here on an AIO file extend */
-		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
+		if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && 
+			dio->result < dio->size))
 			aio_complete(iocb, ret, 0);
 		kfree(dio);
 	}
diff -rupN -X /home/daniel/dontdiff linux-2.6.1-rc1-mm1/mm/filemap.c linux-2.6.1-rc1-mm1.ddm/mm/filemap.c
--- linux-2.6.1-rc1-mm1/mm/filemap.c	2003-12-31 10:52:47.039988554 -0800
+++ linux-2.6.1-rc1-mm1.ddm/mm/filemap.c	2003-12-31 13:29:51.263431439 -0800
@@ -1908,14 +1908,16 @@ __generic_file_aio_write_nolock(struct k
 		 */
 		if (written >= 0 && file->f_flags & O_SYNC)
 			status = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-		if (written >= 0 && !is_sync_kiocb(iocb))
+		if (written >= count && !is_sync_kiocb(iocb))
 			written = -EIOCBQUEUED;
-		if (written != -ENOTBLK)
+		if (written < 0 || written >= count)
 			goto out_status;
 		/*
 		 * direct-io write to a hole: fall through to buffered I/O
+		 * for completing the rest of the request.
 		 */
-		written = 0;
+		pos += written;
+		count -= written;
 	}
 
 	buf = iov->iov_base;

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH linux-2.6.1-rc1-mm1] dio_isize.patch
  2003-12-31 22:34                               ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
  2003-12-31 22:41                                 ` [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch Daniel McNeil
@ 2003-12-31 22:47                                 ` Daniel McNeil
  2003-12-31 23:42                                 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Andrew Morton
  2 siblings, 0 replies; 58+ messages in thread
From: Daniel McNeil @ 2003-12-31 22:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Suparna Bhattacharya, janetmor, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 177 bytes --]

This patch samples i_size before dropping the i_sem.
The i_size could change by a racing write and we could
return uninitialized data.

re-diffed against 2.6.1-rc1-mm1.

Daniel

[-- Attachment #2: linux-2.6.1-rc1-mm1.dio_isize.patch --]
[-- Type: text/plain, Size: 1049 bytes --]

--- linux-2.6.1-rc1-mm1/fs/direct-io.c	2003-12-31 10:52:45.940193469 -0800
+++ linux-2.6.1-rc1-mm1.dio_isize/fs/direct-io.c	2003-12-31 13:56:06.836825169 -0800
@@ -882,6 +882,7 @@ direct_io_worker(int rw, struct kiocb *i
 	int ret = 0;
 	int ret2;
 	size_t bytes;
+	loff_t i_size;
 
 	dio->bio = NULL;
 	dio->inode = inode;
@@ -982,7 +983,12 @@ direct_io_worker(int rw, struct kiocb *i
 	 * All block lookups have been performed. For READ requests
 	 * we can let i_sem go now that its achieved its purpose
 	 * of protecting us from looking up uninitialized blocks.
+	 * 
+	 * We also need sample i_size before we release i_sem to prevent
+	 * a racing write from changing i_size causing us to return
+	 * uninitialized data.
 	 */
+	i_size = i_size_read(inode);
 	if ((rw == READ) && dio->needs_locking)
 		up(&dio->inode->i_sem);
 
@@ -1026,7 +1032,6 @@ direct_io_worker(int rw, struct kiocb *i
 		if (ret == 0)
 			ret = dio->page_errors;
 		if (ret == 0 && dio->result) {
-			loff_t i_size = i_size_read(inode);
 
 			ret = dio->result;
 			/*

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch
  2003-12-31 22:34                               ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
  2003-12-31 22:41                                 ` [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch Daniel McNeil
  2003-12-31 22:47                                 ` [PATCH linux-2.6.1-rc1-mm1] dio_isize.patch Daniel McNeil
@ 2003-12-31 23:42                                 ` Andrew Morton
  2004-01-02  4:20                                   ` Suparna Bhattacharya
  2 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2003-12-31 23:42 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: suparna, janetmor, pbadari, linux-aio, linux-kernel

Daniel McNeil <daniel@osdl.org> wrote:
>
> The other potential race in filemap_fdatawait() was that it
> removed the page from the locked list while waiting for a writeback
> and if there was a 2nd filemap_fdatawait() running on another cpu,
> it would not wait for the page being written since it would never see
> it on the list.

That would only happen if one thread or the other was not running under
i_sem.   The only path I see doing that is in generic_file_direct_IO()?

> +		/*
> +		 * If the page is locked, it might be in process of being 
> +		 * setup for writeback but without PG_writeback set 
> +		 * and with PG_dirty cleared.
> +		 * (PG_dirty is cleared BEFORE PG_writeback is set)
> +		 * So, wait for the PG_locked to clear, then start over.
> +		 */
> +		if (PageLocked(page)) {
> +			page_cache_get(page);
> +			spin_unlock(&mapping->page_lock);
> +			wait_on_page_locked(page);
> +			page_cache_release(page);
> +			goto restart;
> +		}

Why is this a problem which needs addressing here?  If some other thread is
in the process of starting I/O against this page then the page must have
been clean when this thread ran filemap_fdatawrite()?


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2003-12-31 22:41                                 ` [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch Daniel McNeil
@ 2003-12-31 23:46                                   ` Andrew Morton
  2004-01-02  5:14                                     ` Suparna Bhattacharya
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2003-12-31 23:46 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: suparna, janetmor, pbadari, linux-aio, linux-kernel

Daniel McNeil <daniel@osdl.org> wrote:
>
> This is an update of AIO fallback patch and with the bio_count race
>  fixed by changing bio_list_lock into bio_lock and using that for all
>  the bio fields.  I changed bio_count and bios_in_flight from atomics
>  into int.  They are now proctected by the bio_lock.  I fixed the race,
>  by in finished_one_bio() by leaving the bio_count at 1 until after the
>  dio_complete() and then do the bio_count decrement and wakeup holding
>  the bio_lock.

Sob.  Daniel, please assume that I have an extremely small brain and forget
things very easily.  And that 2,000 patches have passed under my nose since
last I contemplated the direct-io code, OK?

What bio_count race?

And the patch seems to be doing more than your description describes?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch
  2003-12-31 23:42                                 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Andrew Morton
@ 2004-01-02  4:20                                   ` Suparna Bhattacharya
  2004-01-02  4:36                                     ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2004-01-02  4:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Daniel McNeil, janetmor, pbadari, linux-aio, linux-kernel

On Wed, Dec 31, 2003 at 03:42:39PM -0800, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > The other potential race in filemap_fdatawait() was that it
> > removed the page from the locked list while waiting for a writeback
> > and if there was a 2nd filemap_fdatawait() running on another cpu,
> > it would not wait for the page being written since it would never see
> > it on the list.
> 
> That would only happen if one thread or the other was not running under
> i_sem.   The only path I see doing that is in generic_file_direct_IO()?

Yes, and we should simply fix generic_file_direct_IO to avoid doing so.
We anyway issue filemap_fdatawait later with i_sem held.

The race that we need to worry about is between background writeouts
(which don't take i_sem) and filemap_fdatawrite/filemap_fdatawait - i.e
the first one discussed.

> 
> > +		/*
> > +		 * If the page is locked, it might be in process of being 
> > +		 * setup for writeback but without PG_writeback set 
> > +		 * and with PG_dirty cleared.
> > +		 * (PG_dirty is cleared BEFORE PG_writeback is set)
> > +		 * So, wait for the PG_locked to clear, then start over.
> > +		 */
> > +		if (PageLocked(page)) {
> > +			page_cache_get(page);
> > +			spin_unlock(&mapping->page_lock);
> > +			wait_on_page_locked(page);
> > +			page_cache_release(page);
> > +			goto restart;
> > +		}
> 
> Why is this a problem which needs addressing here?  If some other thread is
> in the process of starting I/O against this page then the page must have
> been clean when this thread ran filemap_fdatawrite()?

This is the same race that we have been discussing (background writer
pulled this page off io_pages, put it on locked pages but hasn't set 
PG_writeback as yet). To me it seemed that Daniel's solution was just an 
alternative to what you proposed - i.e. adding lock_page() to filemap_fdatawait.
I have to think a little about the fix -- AFAICS but we are all talking
about the same (real) problem here.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch
  2004-01-02  4:20                                   ` Suparna Bhattacharya
@ 2004-01-02  4:36                                     ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2004-01-02  4:36 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Suparna Bhattacharya <suparna@in.ibm.com> wrote:
>
> On Wed, Dec 31, 2003 at 03:42:39PM -0800, Andrew Morton wrote:
> > Daniel McNeil <daniel@osdl.org> wrote:
> > >
> > > The other potential race in filemap_fdatawait() was that it
> > > removed the page from the locked list while waiting for a writeback
> > > and if there was a 2nd filemap_fdatawait() running on another cpu,
> > > it would not wait for the page being written since it would never see
> > > it on the list.
> > 
> > That would only happen if one thread or the other was not running under
> > i_sem.   The only path I see doing that is in generic_file_direct_IO()?
> 
> Yes, and we should simply fix generic_file_direct_IO to avoid doing so.
> We anyway issue filemap_fdatawait later with i_sem held.
> 
> The race that we need to worry about is between background writeouts
> (which don't take i_sem) and filemap_fdatawrite/filemap_fdatawait - i.e
> the first one discussed.

Well Daniel has raised a second race here, betwen filemap_fdatawait() and
filemap_fdatwait().  The background writeback code does not execute
filemap_datawait() anyway, so no prob.

Yes, extending i_sem coverage in the case O_DIRECT reads should suit.  In
the (vastly) common case mapping->nrpages is zero anyway, so we shouldn't
even enter that code.

> > > +		/*
> > > +		 * If the page is locked, it might be in process of being 
> > > +		 * setup for writeback but without PG_writeback set 
> > > +		 * and with PG_dirty cleared.
> > > +		 * (PG_dirty is cleared BEFORE PG_writeback is set)
> > > +		 * So, wait for the PG_locked to clear, then start over.
> > > +		 */
> > > +		if (PageLocked(page)) {
> > > +			page_cache_get(page);
> > > +			spin_unlock(&mapping->page_lock);
> > > +			wait_on_page_locked(page);
> > > +			page_cache_release(page);
> > > +			goto restart;
> > > +		}
> > 
> > Why is this a problem which needs addressing here?  If some other thread is
> > in the process of starting I/O against this page then the page must have
> > been clean when this thread ran filemap_fdatawrite()?
> 
> This is the same race that we have been discussing (background writer
> pulled this page off io_pages, put it on locked pages but hasn't set 
> PG_writeback as yet). To me it seemed that Daniel's solution was just an 
> alternative to what you proposed - i.e. adding lock_page() to filemap_fdatawait.
> I have to think a little about the fix -- AFAICS but we are all talking
> about the same (real) problem here.

Yup.  Realish, anyway.  Unless we can demonstrate that this is the cause of
the O_DIRECT data-exposure problems, this race isn't really very
interesting.  It should be plugged though I guess.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2003-12-31 23:46                                   ` Andrew Morton
@ 2004-01-02  5:14                                     ` Suparna Bhattacharya
  2004-01-02  7:46                                       ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2004-01-02  5:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Daniel McNeil, janetmor, pbadari, linux-aio, linux-kernel


On Wed, Dec 31, 2003 at 03:46:48PM -0800, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > This is an update of AIO fallback patch and with the bio_count race
> >  fixed by changing bio_list_lock into bio_lock and using that for all
> >  the bio fields.  I changed bio_count and bios_in_flight from atomics
> >  into int.  They are now proctected by the bio_lock.  I fixed the race,
> >  by in finished_one_bio() by leaving the bio_count at 1 until after the
> >  dio_complete() and then do the bio_count decrement and wakeup holding
> >  the bio_lock.
> 
> Sob.  Daniel, please assume that I have an extremely small brain and forget
> things very easily.  And that 2,000 patches have passed under my nose since
> last I contemplated the direct-io code, OK?
> 
> What bio_count race?
> 
> And the patch seems to be doing more than your description describes?
> --

We need to combine this with my original patch description of the
aio-dio-fallback + follow on patch which had the race that Daniel
fixed and then rolled together into one working patch.

Does the following sound better as complete description ?

-----------------------------------------------------------------

This patch ensures that when the DIO code falls back to buffered i/o 
after having submitted part of the i/o, then buffered i/o is issued only
for the remaining part of the request (i.e. the part not already
covered by DIO), rather than redo the entire i/o.

We need to careful not to access dio fields if its possible that 
the dio could already have been freed asynchronously during i/o 
completion. A tricky part of this involves plugging the window between 
the decrement of bio_count and accessing dio->waiter during i/o 
completion where the dio could get freed by the submission path. 
This potential "bio_count race" was tackled (by Daniel) by changing 
bio_list_lock into bio_lock and using that for all the bio fields. 
Now bio_count and bios_in_flight have been converted from atomics 
into int and are both protected by the bio_lock. The race in 
finished_one_bio() could thus be fixed by leaving the bio_count at 1 
until after the dio_complete() and then doing the bio_count decrement 
and wakeup holding the bio_lock. It appears that shifting to the
spin_lock instead of atomic_inc/decs is ok performance wise as 
well.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31 11:17                             ` Andrew Morton
  2003-12-31 22:34                               ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
@ 2004-01-02  5:50                               ` Suparna Bhattacharya
  2004-01-02  7:31                                 ` Andrew Morton
                                                   ` (2 more replies)
  2004-01-11 23:14                               ` Janet Morgan
  2 siblings, 3 replies; 58+ messages in thread
From: Suparna Bhattacharya @ 2004-01-02  5:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

On Wed, Dec 31, 2003 at 03:17:36AM -0800, Andrew Morton wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> >
> > Let me actually think about this a bit.
> 
> Nasty.  The same race is present in 2.4.x...
> 
> How's about we start new I/O in filemap_fdatawait() if the page is dirty?
> 

Makes sense to me.
There's a chance that this could explain why Daniel saw exposures even 
with his fix. 

Would be interesting to see his results with your patch.

Though we might as well plug this anyway ?

> 
> diff -puN mm/filemap.c~a mm/filemap.c
> --- 25/mm/filemap.c~a	2003-12-31 03:10:29.000000000 -0800
> +++ 25-akpm/mm/filemap.c	2003-12-31 03:17:05.000000000 -0800
> @@ -206,7 +206,13 @@ restart:
>  		page_cache_get(page);
>  		spin_unlock(&mapping->page_lock);
>  
> -		wait_on_page_writeback(page);
> +		lock_page(page);
> +		if (PageDirty(page) && mapping->a_ops->writepage) {
> +			write_one_page(page, 1);
> +		} else {
> +			wait_on_page_writeback(page);
> +			unlock_page(page);

Would we lose anything if we unlock_page() before wait_on_page_writeback() ?
I was thinking about the corresponding fix in sync_page_range, and it
would make life easier for retry based fs-AIO if we could move the
unlock_page before the wait.

> +		}
>  		if (PageError(page))
>  			ret = -EIO;
>  
> 
> 

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2004-01-02  5:50                               ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
@ 2004-01-02  7:31                                 ` Andrew Morton
  2004-01-05 13:49                                 ` Marcelo Tosatti
  2004-03-29 15:44                                 ` Marcelo Tosatti
  2 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2004-01-02  7:31 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Suparna Bhattacharya <suparna@in.ibm.com> wrote:
>
> On Wed, Dec 31, 2003 at 03:17:36AM -0800, Andrew Morton wrote:
> > Andrew Morton <akpm@osdl.org> wrote:
> > >
> > > Let me actually think about this a bit.
> > 
> > Nasty.  The same race is present in 2.4.x...
> > 
> > How's about we start new I/O in filemap_fdatawait() if the page is dirty?
> > 
> 
> Makes sense to me.
> There's a chance that this could explain why Daniel saw exposures even 
> with his fix. 
> 
> Would be interesting to see his results with your patch.
> 
> Though we might as well plug this anyway ?

Yes, we should.  I'm not dreadfully keen on starting I/O from within
filemap_fdatawait() though - it seems "wrong" somehow.

> > 
> > diff -puN mm/filemap.c~a mm/filemap.c
> > --- 25/mm/filemap.c~a	2003-12-31 03:10:29.000000000 -0800
> > +++ 25-akpm/mm/filemap.c	2003-12-31 03:17:05.000000000 -0800
> > @@ -206,7 +206,13 @@ restart:
> >  		page_cache_get(page);
> >  		spin_unlock(&mapping->page_lock);
> >  
> > -		wait_on_page_writeback(page);
> > +		lock_page(page);
> > +		if (PageDirty(page) && mapping->a_ops->writepage) {
> > +			write_one_page(page, 1);
> > +		} else {
> > +			wait_on_page_writeback(page);
> > +			unlock_page(page);
> 
> Would we lose anything if we unlock_page() before wait_on_page_writeback() ?

No, that should be OK.



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2004-01-02  5:14                                     ` Suparna Bhattacharya
@ 2004-01-02  7:46                                       ` Andrew Morton
  2004-01-05  3:55                                         ` Suparna Bhattacharya
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2004-01-02  7:46 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Suparna Bhattacharya <suparna@in.ibm.com> wrote:
>
>  Does the following sound better as complete description ?

It does, thanks.  But it does not shed a lot of light on the filemap.c
changes - what's going on there?

What is the significance of `written > count' in there, and of `dio->result
> dio->size' in finished_one_bio()?  How can these states come about?


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2004-01-02  7:46                                       ` Andrew Morton
@ 2004-01-05  3:55                                         ` Suparna Bhattacharya
  2004-01-05  5:06                                           ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2004-01-05  3:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

On Thu, Jan 01, 2004 at 11:46:34PM -0800, Andrew Morton wrote:
> Suparna Bhattacharya <suparna@in.ibm.com> wrote:
> >
> >  Does the following sound better as complete description ?
> 
> It does, thanks.  But it does not shed a lot of light on the filemap.c
> changes - what's going on there?
> 
> What is the significance of `written > count' in there, and of `dio->result
> > dio->size' in finished_one_bio()?  How can these states come about?
> 

Both of these are checks to determine whether all or only a part of
the request was serviced by DIO, leaving the rest to be serviced by
falling back to buffered I/O.

Earlier, we would decide this based on whether dio->result == -ENOTBLK
and written == -ENOTBLK respectively. However, now that we also need the 
result to indicate how much actually got written even in -ENOTBLK case, 
so that we can issue buffered i/o only for the remaining part of the request,
dio->result and written had to be modified to reflect the number of bytes 
transferred by DIO even for the fallback to buffered i/o case. 
So we need checks like this to find out whether there is some i/o left
to be issued via buffered i/o.

If dio->result == dio->size, then the entire request has been handled
by DIO and we don't have to bother about falling back to buffered i/o.
Otherwise, even for AIO, need to force a wait for the part of the
request already issued.

If written == count, then the entire request has been handled by DIO;
otherwise, if written < count, it means that we need to fallthrough to
buffered i/o for the remaining bytes.

What about short writes ? Most real errors during DIO write are 
reflected in the return value directly, so we should be OK (Worst case 
we'd end up with an additional but harmless attempt via buffered i/o,  
however, I didn't spot a case where that would actually occur).

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2004-01-05  3:55                                         ` Suparna Bhattacharya
@ 2004-01-05  5:06                                           ` Andrew Morton
  2004-01-05  5:28                                             ` Suparna Bhattacharya
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2004-01-05  5:06 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Suparna Bhattacharya <suparna@in.ibm.com> wrote:
>
> > What is the significance of `written > count' in there, and of `dio->result
>  > > dio->size' in finished_one_bio()?  How can these states come about?
>  > 
> 
>  Both of these are checks to determine whether all or only a part of
>  the request was serviced by DIO, leaving the rest to be serviced by
>  falling back to buffered I/O.
> 
>  Earlier, we would decide this based on whether dio->result == -ENOTBLK
>  and written == -ENOTBLK respectively. However, now that we also need the 
>  result to indicate how much actually got written even in -ENOTBLK case, 
>  so that we can issue buffered i/o only for the remaining part of the request,
>  dio->result and written had to be modified to reflect the number of bytes 
>  transferred by DIO even for the fallback to buffered i/o case. 
>  So we need checks like this to find out whether there is some i/o left
>  to be issued via buffered i/o.

Sure.  But the generic_file_aio_write_nolock() code is doing this:

		if (written >= count && !is_sync_kiocb(iocb))
			written = -EIOCBQUEUED;
		if (written < 0 || written >= count)
			goto out_status;


Under what circumstances can `written' (the amount which was written) be
greater than `count' (the amount to write)?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2004-01-05  5:06                                           ` Andrew Morton
@ 2004-01-05  5:28                                             ` Suparna Bhattacharya
  2004-01-05  5:28                                               ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2004-01-05  5:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

On Sun, Jan 04, 2004 at 09:06:42PM -0800, Andrew Morton wrote:
> Suparna Bhattacharya <suparna@in.ibm.com> wrote:
> >
> > > What is the significance of `written > count' in there, and of `dio->result
> >  > > dio->size' in finished_one_bio()?  How can these states come about?
> >  > 
> > 
> >  Both of these are checks to determine whether all or only a part of
> >  the request was serviced by DIO, leaving the rest to be serviced by
> >  falling back to buffered I/O.
> > 
> >  Earlier, we would decide this based on whether dio->result == -ENOTBLK
> >  and written == -ENOTBLK respectively. However, now that we also need the 
> >  result to indicate how much actually got written even in -ENOTBLK case, 
> >  so that we can issue buffered i/o only for the remaining part of the request,
> >  dio->result and written had to be modified to reflect the number of bytes 
> >  transferred by DIO even for the fallback to buffered i/o case. 
> >  So we need checks like this to find out whether there is some i/o left
> >  to be issued via buffered i/o.
> 
> Sure.  But the generic_file_aio_write_nolock() code is doing this:
> 
> 		if (written >= count && !is_sync_kiocb(iocb))
> 			written = -EIOCBQUEUED;
> 		if (written < 0 || written >= count)
> 			goto out_status;
> 
> 
> Under what circumstances can `written' (the amount which was written) be
> greater than `count' (the amount to write)?

None. The '>' situation should never occur.

This is just being explicit about covering the "not less than" case
as a whole, and making sure we do not fall through to buffered i/o in
that case, i.e its the same as:
if (!(written < count) && !is_sync_kiocb(iocb))

Is that any less confusing ? Or would you rather just replace the '>=" by
"=='.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2004-01-05  5:28                                             ` Suparna Bhattacharya
@ 2004-01-05  5:28                                               ` Andrew Morton
  2004-01-05  6:06                                                 ` Suparna Bhattacharya
  2004-01-05  6:14                                                 ` Lincoln Dale
  0 siblings, 2 replies; 58+ messages in thread
From: Andrew Morton @ 2004-01-05  5:28 UTC (permalink / raw)
  To: suparna; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

Suparna Bhattacharya <suparna@in.ibm.com> wrote:
>
> > Sure.  But the generic_file_aio_write_nolock() code is doing this:
>  > 
>  > 		if (written >= count && !is_sync_kiocb(iocb))
>  > 			written = -EIOCBQUEUED;
>  > 		if (written < 0 || written >= count)
>  > 			goto out_status;
>  > 
>  > 
>  > Under what circumstances can `written' (the amount which was written) be
>  > greater than `count' (the amount to write)?
> 
>  None. The '>' situation should never occur.
> 
>  This is just being explicit about covering the "not less than" case
>  as a whole, and making sure we do not fall through to buffered i/o in
>  that case, i.e its the same as:
>  if (!(written < count) && !is_sync_kiocb(iocb))
>
>  Is that any less confusing ? Or would you rather just replace the '>=" by
>  "=='.

Well the original confused the heck out of me!  yes, `if (written == count)'
should be fine: it says exactly what we want it to say.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2004-01-05  5:28                                               ` Andrew Morton
@ 2004-01-05  6:06                                                 ` Suparna Bhattacharya
  2004-01-05  6:14                                                 ` Lincoln Dale
  1 sibling, 0 replies; 58+ messages in thread
From: Suparna Bhattacharya @ 2004-01-05  6:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: daniel, janetmor, pbadari, linux-aio, linux-kernel

On Sun, Jan 04, 2004 at 09:28:55PM -0800, Andrew Morton wrote:
> Suparna Bhattacharya <suparna@in.ibm.com> wrote:
> >
> > > Sure.  But the generic_file_aio_write_nolock() code is doing this:
> >  > 
> >  > 		if (written >= count && !is_sync_kiocb(iocb))
> >  > 			written = -EIOCBQUEUED;
> >  > 		if (written < 0 || written >= count)
> >  > 			goto out_status;
> >  > 
> >  > 
> >  > Under what circumstances can `written' (the amount which was written) be
> >  > greater than `count' (the amount to write)?
> > 
> >  None. The '>' situation should never occur.
> > 
> >  This is just being explicit about covering the "not less than" case
> >  as a whole, and making sure we do not fall through to buffered i/o in
> >  that case, i.e its the same as:
> >  if (!(written < count) && !is_sync_kiocb(iocb))
> >
> >  Is that any less confusing ? Or would you rather just replace the '>=" by
> >  "=='.
> 
> Well the original confused the heck out of me!  yes, `if (written == count)'

Sorry about that.

> should be fine: it says exactly what we want it to say.
> 

I made that change. Here's the updated patch with the change 
desciption.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India

-----------------------------------------------------------------

This patch ensures that when the DIO code falls back to buffered i/o 
after having submitted part of the i/o, then buffered i/o is issued only
for the remaining part of the request (i.e. the part not already
covered by DIO), rather than redo the entire i/o. Now, instead of 
returning written == -ENOTBLK, generic_file_direct_IO returns the 
number of bytes already handled by DIO, so that the caller knows 
how much of the I/O is left to be handled via fallback to buffered
write.

We need to careful not to access dio fields if its possible that 
the dio could already have been freed asynchronously during i/o 
completion. A tricky part of this involves plugging the window between 
the decrement of bio_count and accessing dio->waiter during i/o 
completion where the dio could get freed by the submission path. 
This potential "bio_count race" was tackled (by Daniel) by changing 
bio_list_lock into bio_lock and using that for all the bio fields. 
Now bio_count and bios_in_flight have been converted from atomics 
into int and are both protected by the bio_lock. The race in 
finished_one_bio() could thus be fixed by leaving the bio_count at 1 
until after the dio_complete() and then doing the bio_count decrement 
and wakeup holding the bio_lock. It appears that shifting to the
spin_lock instead of atomic_inc/decs is ok performance wise as 
well.

-------------------------------------------------------------

diff -rupN -X /home/daniel/dontdiff linux-2.6.1-rc1-mm1/fs/direct-io.c linux-2.6.1-rc1-mm1.ddm/fs/direct-io.c
--- linux-2.6.1-rc1-mm1/fs/direct-io.c	2003-12-31 10:52:45.940193469 -0800
+++ linux-2.6.1-rc1-mm1.ddm/fs/direct-io.c	2003-12-31 13:29:51.260432002 -0800
@@ -74,6 +74,7 @@ struct dio {
 					   been performed at the start of a
 					   write */
 	int pages_in_io;		/* approximate total IO pages */
+	size_t	size;			/* total request size (doesn't change)*/
 	sector_t block_in_file;		/* Current offset into the underlying
 					   file in dio_block units. */
 	unsigned blocks_available;	/* At block_in_file.  changes */
@@ -115,9 +116,9 @@ struct dio {
 	int page_errors;		/* errno from get_user_pages() */
 
 	/* BIO completion state */
-	atomic_t bio_count;		/* nr bios to be completed */
-	atomic_t bios_in_flight;	/* nr bios in flight */
-	spinlock_t bio_list_lock;	/* protects bio_list */
+	spinlock_t bio_lock;		/* protects BIO fields below */
+	int bio_count;			/* nr bios to be completed */
+	int bios_in_flight;		/* nr bios in flight */
 	struct bio *bio_list;		/* singly linked via bi_private */
 	struct task_struct *waiter;	/* waiting task (NULL if none) */
 
@@ -221,20 +222,38 @@ static void dio_complete(struct dio *dio
  */
 static void finished_one_bio(struct dio *dio)
 {
-	if (atomic_dec_and_test(&dio->bio_count)) {
+	unsigned long flags;
+
+	spin_lock_irqsave(&dio->bio_lock, flags);
+	if (dio->bio_count == 1) {
 		if (dio->is_async) {
+			/*
+			 * Last reference to the dio is going away.
+			 * Drop spinlock and complete the DIO.
+			 */
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			dio_complete(dio, dio->block_in_file << dio->blkbits,
 					dio->result);
 			/* Complete AIO later if falling back to buffered i/o */
-			if (dio->result != -ENOTBLK) {
+			if (dio->result == dio->size || dio->rw == READ) {
 				aio_complete(dio->iocb, dio->result, 0);
 				kfree(dio);
+				return;
 			} else {
+				/*
+				 * Falling back to buffered
+				 */
+				spin_lock_irqsave(&dio->bio_lock, flags);
+				dio->bio_count--;
 				if (dio->waiter)
 					wake_up_process(dio->waiter);
+				spin_unlock_irqrestore(&dio->bio_lock, flags);
+				return;
 			}
 		}
 	}
+	dio->bio_count--;
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 }
 
 static int dio_bio_complete(struct dio *dio, struct bio *bio);
@@ -268,13 +287,13 @@ static int dio_bio_end_io(struct bio *bi
 	if (bio->bi_size)
 		return 1;
 
-	spin_lock_irqsave(&dio->bio_list_lock, flags);
+	spin_lock_irqsave(&dio->bio_lock, flags);
 	bio->bi_private = dio->bio_list;
 	dio->bio_list = bio;
-	atomic_dec(&dio->bios_in_flight);
-	if (dio->waiter && atomic_read(&dio->bios_in_flight) == 0)
+	dio->bios_in_flight--;
+	if (dio->waiter && dio->bios_in_flight == 0)
 		wake_up_process(dio->waiter);
-	spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 	return 0;
 }
 
@@ -307,10 +326,13 @@ dio_bio_alloc(struct dio *dio, struct bl
 static void dio_bio_submit(struct dio *dio)
 {
 	struct bio *bio = dio->bio;
+	unsigned long flags;
 
 	bio->bi_private = dio;
-	atomic_inc(&dio->bio_count);
-	atomic_inc(&dio->bios_in_flight);
+	spin_lock_irqsave(&dio->bio_lock, flags);
+	dio->bio_count++;
+	dio->bios_in_flight++;
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 	if (dio->is_async && dio->rw == READ)
 		bio_set_pages_dirty(bio);
 	submit_bio(dio->rw, bio);
@@ -336,22 +358,22 @@ static struct bio *dio_await_one(struct 
 	unsigned long flags;
 	struct bio *bio;
 
-	spin_lock_irqsave(&dio->bio_list_lock, flags);
+	spin_lock_irqsave(&dio->bio_lock, flags);
 	while (dio->bio_list == NULL) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (dio->bio_list == NULL) {
 			dio->waiter = current;
-			spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			blk_run_queues();
 			io_schedule();
-			spin_lock_irqsave(&dio->bio_list_lock, flags);
+			spin_lock_irqsave(&dio->bio_lock, flags);
 			dio->waiter = NULL;
 		}
 		set_current_state(TASK_RUNNING);
 	}
 	bio = dio->bio_list;
 	dio->bio_list = bio->bi_private;
-	spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 	return bio;
 }
 
@@ -393,7 +415,12 @@ static int dio_await_completion(struct d
 	if (dio->bio)
 		dio_bio_submit(dio);
 
-	while (atomic_read(&dio->bio_count)) {
+	/*
+	 * The bio_lock is not held for the read of bio_count.
+	 * This is ok since it is the dio_bio_complete() that changes
+	 * bio_count.
+	 */
+	while (dio->bio_count) {
 		struct bio *bio = dio_await_one(dio);
 		int ret2;
 
@@ -420,10 +447,10 @@ static int dio_bio_reap(struct dio *dio)
 			unsigned long flags;
 			struct bio *bio;
 
-			spin_lock_irqsave(&dio->bio_list_lock, flags);
+			spin_lock_irqsave(&dio->bio_lock, flags);
 			bio = dio->bio_list;
 			dio->bio_list = bio->bi_private;
-			spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			ret = dio_bio_complete(dio, bio);
 		}
 		dio->reap_counter = 0;
@@ -889,6 +916,7 @@ direct_io_worker(int rw, struct kiocb *i
 	dio->blkbits = blkbits;
 	dio->blkfactor = inode->i_blkbits - blkbits;
 	dio->start_zero_done = 0;
+	dio->size = 0;
 	dio->block_in_file = offset >> blkbits;
 	dio->blocks_available = 0;
 	dio->cur_page = NULL;
@@ -913,9 +941,9 @@ direct_io_worker(int rw, struct kiocb *i
 	 * (or synchronous) device could take the count to zero while we're
 	 * still submitting BIOs.
 	 */
-	atomic_set(&dio->bio_count, 1);
-	atomic_set(&dio->bios_in_flight, 0);
-	spin_lock_init(&dio->bio_list_lock);
+	dio->bio_count = 1;
+	dio->bios_in_flight = 0;
+	spin_lock_init(&dio->bio_lock);
 	dio->bio_list = NULL;
 	dio->waiter = NULL;
 
@@ -925,7 +953,7 @@ direct_io_worker(int rw, struct kiocb *i
 
 	for (seg = 0; seg < nr_segs; seg++) {
 		user_addr = (unsigned long)iov[seg].iov_base;
-		bytes = iov[seg].iov_len;
+		dio->size += bytes = iov[seg].iov_len;
 
 		/* Index into the first page of the first block */
 		dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits;
@@ -956,6 +984,13 @@ direct_io_worker(int rw, struct kiocb *i
 		}
 	} /* end iovec loop */
 
+	if (ret == -ENOTBLK && rw == WRITE) {
+		/*
+		 * The remaining part of the request will be 
+		 * be handled by buffered I/O when we return
+		 */
+		ret = 0;
+	}
 	/*
 	 * There may be some unwritten disk at the end of a part-written
 	 * fs-block-sized block.  Go zero that now.
@@ -991,32 +1026,35 @@ direct_io_worker(int rw, struct kiocb *i
 	 * reflect the number of to-be-processed BIOs.
 	 */
 	if (dio->is_async) {
-		if (ret == 0)
-			ret = dio->result;	/* Bytes written */
-		if (ret == -ENOTBLK) {
-			/*
-			 * The request will be reissued via buffered I/O
-			 * when we return; Any I/O already issued
-			 * effectively becomes redundant.
-			 */
-			dio->result = ret;
+		int should_wait = 0;
+		
+		if (dio->result < dio->size && rw == WRITE) {
 			dio->waiter = current;
+			should_wait = 1;
 		}
+		if (ret == 0)
+			ret = dio->result;
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
-		if (ret == -ENOTBLK) {
+		if (should_wait) {
+			unsigned long flags;
 			/*
 			 * Wait for already issued I/O to drain out and
 			 * release its references to user-space pages
 			 * before returning to fallback on buffered I/O
 			 */
+
+			spin_lock_irqsave(&dio->bio_lock, flags);
 			set_current_state(TASK_UNINTERRUPTIBLE);
-			while (atomic_read(&dio->bio_count)) {
+			while (dio->bio_count) {
+				spin_unlock_irqrestore(&dio->bio_lock, flags);
 				io_schedule();
+				spin_lock_irqsave(&dio->bio_lock, flags);
 				set_current_state(TASK_UNINTERRUPTIBLE);
 			}
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			set_current_state(TASK_RUNNING);
-			dio->waiter = NULL;
+			kfree(dio);
 		}
 	} else {
 		finished_one_bio(dio);
@@ -1038,7 +1076,8 @@ direct_io_worker(int rw, struct kiocb *i
 		}
 		dio_complete(dio, offset, ret);
 		/* We could have also come here on an AIO file extend */
-		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
+		if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && 
+			dio->result < dio->size))
 			aio_complete(iocb, ret, 0);
 		kfree(dio);
 	}
diff -rupN -X /home/daniel/dontdiff linux-2.6.1-rc1-mm1/mm/filemap.c linux-2.6.1-rc1-mm1.ddm/mm/filemap.c
--- linux-2.6.1-rc1-mm1/mm/filemap.c	2003-12-31 10:52:47.039988554 -0800
+++ linux-2.6.1-rc1-mm1.ddm/mm/filemap.c	2003-12-31 13:29:51.263431439 -0800
@@ -1908,14 +1908,16 @@ __generic_file_aio_write_nolock(struct k
 		 */
 		if (written >= 0 && file->f_flags & O_SYNC)
 			status = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-		if (written >= 0 && !is_sync_kiocb(iocb))
+		if (written == count && !is_sync_kiocb(iocb))
 			written = -EIOCBQUEUED;
-		if (written != -ENOTBLK)
+		if (written < 0 || written == count)
 			goto out_status;
 		/*
 		 * direct-io write to a hole: fall through to buffered I/O
+		 * for completing the rest of the request.
 		 */
-		written = 0;
+		pos += written;
+		count -= written;
 	}
 
 	buf = iov->iov_base;

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch
  2004-01-05  5:28                                               ` Andrew Morton
  2004-01-05  6:06                                                 ` Suparna Bhattacharya
@ 2004-01-05  6:14                                                 ` Lincoln Dale
  1 sibling, 0 replies; 58+ messages in thread
From: Lincoln Dale @ 2004-01-05  6:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: suparna, daniel, janetmor, pbadari, linux-aio, linux-kernel

At 04:28 PM 5/01/2004, Andrew Morton wrote:
> >  None. The '>' situation should never occur.
> >
> >  This is just being explicit about covering the "not less than" case
> >  as a whole, and making sure we do not fall through to buffered i/o in
> >  that case, i.e its the same as:
> >  if (!(written < count) && !is_sync_kiocb(iocb))
> >
> >  Is that any less confusing ? Or would you rather just replace the '>=" by
> >  "=='.
>
>Well the original confused the heck out of me!  yes, `if (written == count)'
>should be fine: it says exactly what we want it to say.

perhaps the original was defensive programming against a single-bit-error 
hitting 'written' .... :-)


cheers,

lincoln.
PS. just kidding


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2004-01-02  5:50                               ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
  2004-01-02  7:31                                 ` Andrew Morton
@ 2004-01-05 13:49                                 ` Marcelo Tosatti
  2004-01-05 20:27                                   ` Andrew Morton
  2004-03-29 15:44                                 ` Marcelo Tosatti
  2 siblings, 1 reply; 58+ messages in thread
From: Marcelo Tosatti @ 2004-01-05 13:49 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, daniel, janetmor, pbadari, linux-aio, linux-kernel



On Fri, 2 Jan 2004, Suparna Bhattacharya wrote:

> On Wed, Dec 31, 2003 at 03:17:36AM -0800, Andrew Morton wrote:
> > Andrew Morton <akpm@osdl.org> wrote:
> > >
> > > Let me actually think about this a bit.
> >
> > Nasty.  The same race is present in 2.4.x...

filemap_fdatawait() is always called with i_sem held and
there is no "!PG_dirty and !PG_writeback" window.

Where does the race lies in 2.4 ?

Daniel, Would be interesting to know if the direct IO tests also fail on
2.4.

> > How's about we start new I/O in filemap_fdatawait() if the page is
> > dirty?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2004-01-05 13:49                                 ` Marcelo Tosatti
@ 2004-01-05 20:27                                   ` Andrew Morton
  0 siblings, 0 replies; 58+ messages in thread
From: Andrew Morton @ 2004-01-05 20:27 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: suparna, daniel, janetmor, pbadari, linux-aio, linux-kernel

Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
>
> 
> 
> On Fri, 2 Jan 2004, Suparna Bhattacharya wrote:
> 
> > On Wed, Dec 31, 2003 at 03:17:36AM -0800, Andrew Morton wrote:
> > > Andrew Morton <akpm@osdl.org> wrote:
> > > >
> > > > Let me actually think about this a bit.
> > >
> > > Nasty.  The same race is present in 2.4.x...
> 
> filemap_fdatawait() is always called with i_sem held and
> there is no "!PG_dirty and !PG_writeback" window.
> 
> Where does the race lies in 2.4 ?

kupdate and bdflush run filemap_fdatawait() without i_sem.  If kupdate has
moved a page off locked_pages to wait on it, a caller of fsync() just won't
see that page at all, and fsync can return while I/O is still in flight.

Given that this only applies to mmap data in 2.4 it doesn't seem
super-important.  A good way to fix it would be to not call
filemap_fdatawait() *at all* if the caller is kupdate or bdflush - it's
silly.   Could use a PF_foo flag for this?


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
  2003-12-31  6:09           ` Suparna Bhattacharya
@ 2004-01-08 23:55             ` Daniel McNeil
  2004-01-09  3:55               ` Suparna Bhattacharya
  0 siblings, 1 reply; 58+ messages in thread
From: Daniel McNeil @ 2004-01-08 23:55 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Janet Morgan, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

On Tue, 2003-12-30 at 22:09, Suparna Bhattacharya wrote:

> Since the first filemap_write_and_wait call is redundant and somewhat
> suspect since its called w/o i_sem (I can think of unexpected side effects
> on parallel filemap_write_and_wait calls), have you thought of disabling that
> and then trying to see if you can still recreate the problem ? It may
> not make a difference, but it seems like the right thing to do and could
> at least simplify some of the debugging.
> 
> Regards
> Suparna
> 


Ok, I retried my test without the filemap_write_and_wait() that is not
protected by i_sem and the test still sees uninitialized data.  I'm
still running with test10-mm1 + all the patches I sent out before.
I'm haven't tried 2.6.0-rc*-mm1 yet.  I need to move all my debug code
over to the latest mm kernel.  I also did not want to change too much
at same time.

Daniel




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
  2004-01-08 23:55             ` Daniel McNeil
@ 2004-01-09  3:55               ` Suparna Bhattacharya
  2004-02-05  1:39                 ` [PATCH 2.6.2-rc3-mm1] DIO read race fix Daniel McNeil
  0 siblings, 1 reply; 58+ messages in thread
From: Suparna Bhattacharya @ 2004-01-09  3:55 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Janet Morgan, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

On Thu, Jan 08, 2004 at 03:55:44PM -0800, Daniel McNeil wrote:
> On Tue, 2003-12-30 at 22:09, Suparna Bhattacharya wrote:
> 
> > Since the first filemap_write_and_wait call is redundant and somewhat
> > suspect since its called w/o i_sem (I can think of unexpected side effects
> > on parallel filemap_write_and_wait calls), have you thought of disabling that
> > and then trying to see if you can still recreate the problem ? It may
> > not make a difference, but it seems like the right thing to do and could
> > at least simplify some of the debugging.
> > 
> > Regards
> > Suparna
> > 
> 
> 
> Ok, I retried my test without the filemap_write_and_wait() that is not
> protected by i_sem and the test still sees uninitialized data.  I'm
> still running with test10-mm1 + all the patches I sent out before.
> I'm haven't tried 2.6.0-rc*-mm1 yet.  I need to move all my debug code
> over to the latest mm kernel.  I also did not want to change too much
> at same time.

Did you have a chance to try akpm's patch for filemap_fdatawait ? 
(you should be able to apply it to the same kernel that you are running
with, I think)

Regards
Suparna

> 
> Daniel
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2003-12-31 11:17                             ` Andrew Morton
  2003-12-31 22:34                               ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
  2004-01-02  5:50                               ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
@ 2004-01-11 23:14                               ` Janet Morgan
  2004-01-11 23:44                                 ` Andrew Morton
  2004-01-13  4:12                                 ` Janet Morgan
  2 siblings, 2 replies; 58+ messages in thread
From: Janet Morgan @ 2004-01-11 23:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: suparna, daniel, pbadari, linux-aio, linux-kernel

Andrew Morton wrote:

>Andrew Morton <akpm@osdl.org> wrote:
>  
>
>>Let me actually think about this a bit.
>>    
>>
>
>Nasty.  The same race is present in 2.4.x...
>
>How's about we start new I/O in filemap_fdatawait() if the page is dirty?
>
>
>diff -puN mm/filemap.c~a mm/filemap.c
>--- 25/mm/filemap.c~a	2003-12-31 03:10:29.000000000 -0800
>+++ 25-akpm/mm/filemap.c	2003-12-31 03:17:05.000000000 -0800
>@@ -206,7 +206,13 @@ restart:
> 		page_cache_get(page);
> 		spin_unlock(&mapping->page_lock);
> 
>-		wait_on_page_writeback(page);
>+		lock_page(page);
>+		if (PageDirty(page) && mapping->a_ops->writepage) {
>+			write_one_page(page, 1);
>+		} else {
>+			wait_on_page_writeback(page);
>+			unlock_page(page);
>+		}
> 		if (PageError(page))
> 			ret = -EIO;
> 
>
>  
>
That fixed the problem!  Stephen's testcase is running successfully on 
2.6.1-mm1 plus your patch -- no more uninitialized data!

Thanks!
-Janet
P.S. Sorry so late testing this (was on vacation).




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2004-01-11 23:14                               ` Janet Morgan
@ 2004-01-11 23:44                                 ` Andrew Morton
  2004-01-12 18:00                                   ` filemap_fdatawait.patch Daniel McNeil
  2004-01-12 19:39                                   ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Janet Morgan
  2004-01-13  4:12                                 ` Janet Morgan
  1 sibling, 2 replies; 58+ messages in thread
From: Andrew Morton @ 2004-01-11 23:44 UTC (permalink / raw)
  To: Janet Morgan; +Cc: suparna, daniel, pbadari, linux-aio, linux-kernel

Janet Morgan <janetmor@us.ibm.com> wrote:
>
>  >diff -puN mm/filemap.c~a mm/filemap.c
>  >--- 25/mm/filemap.c~a	2003-12-31 03:10:29.000000000 -0800
>  >+++ 25-akpm/mm/filemap.c	2003-12-31 03:17:05.000000000 -0800
>  >@@ -206,7 +206,13 @@ restart:
>  > 		page_cache_get(page);
>  > 		spin_unlock(&mapping->page_lock);
>  > 
>  >-		wait_on_page_writeback(page);
>  >+		lock_page(page);
>  >+		if (PageDirty(page) && mapping->a_ops->writepage) {
>  >+			write_one_page(page, 1);
>  >+		} else {
>  >+			wait_on_page_writeback(page);
>  >+			unlock_page(page);
>  >+		}
>  > 		if (PageError(page))
>  > 			ret = -EIO;
>  > 
>  >
>  >  
>  >
>  That fixed the problem!  Stephen's testcase is running successfully on 
>  2.6.1-mm1 plus your patch -- no more uninitialized data!

Could you please test 2.6.1-mm2 with that patch?  If that works, send the
patch back to me?  (I lost it ;))

It still leaves the AIO situation open.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: filemap_fdatawait.patch
  2004-01-11 23:44                                 ` Andrew Morton
@ 2004-01-12 18:00                                   ` Daniel McNeil
  2004-01-12 19:39                                   ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Janet Morgan
  1 sibling, 0 replies; 58+ messages in thread
From: Daniel McNeil @ 2004-01-12 18:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Janet Morgan, Suparna Bhattacharya, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List

I've started test with this patch on 2.6.1-mm2 this morning.
I've seen previous runs take 24 hours to see corruption,
so I let you tomorrow how things went.

Daniel

On Sun, 2004-01-11 at 15:44, Andrew Morton wrote:
> Janet Morgan <janetmor@us.ibm.com> wrote:
> >
> >  >diff -puN mm/filemap.c~a mm/filemap.c
> >  >--- 25/mm/filemap.c~a	2003-12-31 03:10:29.000000000 -0800
> >  >+++ 25-akpm/mm/filemap.c	2003-12-31 03:17:05.000000000 -0800
> >  >@@ -206,7 +206,13 @@ restart:
> >  > 		page_cache_get(page);
> >  > 		spin_unlock(&mapping->page_lock);
> >  > 
> >  >-		wait_on_page_writeback(page);
> >  >+		lock_page(page);
> >  >+		if (PageDirty(page) && mapping->a_ops->writepage) {
> >  >+			write_one_page(page, 1);
> >  >+		} else {
> >  >+			wait_on_page_writeback(page);
> >  >+			unlock_page(page);
> >  >+		}
> >  > 		if (PageError(page))
> >  > 			ret = -EIO;
> >  > 
> >  >
> >  >  
> >  >
> >  That fixed the problem!  Stephen's testcase is running successfully on 
> >  2.6.1-mm1 plus your patch -- no more uninitialized data!
> 
> Could you please test 2.6.1-mm2 with that patch?  If that works, send the
> patch back to me?  (I lost it ;))
> 
> It still leaves the AIO situation open.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2004-01-11 23:44                                 ` Andrew Morton
  2004-01-12 18:00                                   ` filemap_fdatawait.patch Daniel McNeil
@ 2004-01-12 19:39                                   ` Janet Morgan
  2004-01-12 19:46                                     ` Daniel McNeil
  1 sibling, 1 reply; 58+ messages in thread
From: Janet Morgan @ 2004-01-12 19:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: suparna, daniel, pbadari, linux-aio, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

Andrew Morton wrote:

>Janet Morgan <janetmor@us.ibm.com> wrote:
>
>> >diff -puN mm/filemap.c~a mm/filemap.c
>> >--- 25/mm/filemap.c~a	2003-12-31 03:10:29.000000000 -0800
>> >+++ 25-akpm/mm/filemap.c	2003-12-31 03:17:05.000000000 -0800
>> >@@ -206,7 +206,13 @@ restart:
>> > 		page_cache_get(page);
>> > 		spin_unlock(&mapping->page_lock);
>> > 
>> >-		wait_on_page_writeback(page);
>> >+		lock_page(page);
>> >+		if (PageDirty(page) && mapping->a_ops->writepage) {
>> >+			write_one_page(page, 1);
>> >+		} else {
>> >+			wait_on_page_writeback(page);
>> >+			unlock_page(page);
>> >+		}
>> > 		if (PageError(page))
>> > 			ret = -EIO;
>> > 
>> >
>> >  
>> >
>> That fixed the problem!  Stephen's testcase is running successfully on 
>> 2.6.1-mm1 plus your patch -- no more uninitialized data!
>>
>
>Could you please test 2.6.1-mm2 with that patch?  If that works, send the
>patch back to me?  (I lost it ;))
>
>
I see uninitialized data when I test the patch (attached) on 2.6.1-mm2.  
Any thoughts on a next step?

Thanks,
-Janet

[-- Attachment #2: filemap_fdatawait.patch --]
[-- Type: text/plain, Size: 507 bytes --]

diff -puN mm/filemap.c~a mm/filemap.c
--- 25/mm/filemap.c~a   2003-12-31 03:10:29.000000000 -0800
+++ 25-akpm/mm/filemap.c        2003-12-31 03:17:05.000000000 -0800
@@ -209,7 +209,13 @@ restart:
		page_cache_get(page);
		spin_unlock(&mapping->page_lock);

-		wait_on_page_writeback(page);
+		lock_page(page);
+		if (PageDirty(page) && mapping->a_ops->writepage) {
+			write_one_page(page, 1);
+		} else {
+			wait_on_page_writeback(page);
+			unlock_page(page);
+		}
		if (PageError(page))
			ret = -EIO;


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2004-01-12 19:39                                   ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Janet Morgan
@ 2004-01-12 19:46                                     ` Daniel McNeil
  0 siblings, 0 replies; 58+ messages in thread
From: Daniel McNeil @ 2004-01-12 19:46 UTC (permalink / raw)
  To: Janet Morgan
  Cc: Andrew Morton, Suparna Bhattacharya, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List

I just got a test failure also on 2.6.1-mm2 + patch also.
I took nearly 2 hours with 6 copies running to hit.

Daniel

On Mon, 2004-01-12 at 11:39, Janet Morgan wrote:
> Andrew Morton wrote:
> 
> >Janet Morgan <janetmor@us.ibm.com> wrote:
> >
> >> >diff -puN mm/filemap.c~a mm/filemap.c
> >> >--- 25/mm/filemap.c~a	2003-12-31 03:10:29.000000000 -0800
> >> >+++ 25-akpm/mm/filemap.c	2003-12-31 03:17:05.000000000 -0800
> >> >@@ -206,7 +206,13 @@ restart:
> >> > 		page_cache_get(page);
> >> > 		spin_unlock(&mapping->page_lock);
> >> > 
> >> >-		wait_on_page_writeback(page);
> >> >+		lock_page(page);
> >> >+		if (PageDirty(page) && mapping->a_ops->writepage) {
> >> >+			write_one_page(page, 1);
> >> >+		} else {
> >> >+			wait_on_page_writeback(page);
> >> >+			unlock_page(page);
> >> >+		}
> >> > 		if (PageError(page))
> >> > 			ret = -EIO;
> >> > 
> >> >
> >> >  
> >> >
> >> That fixed the problem!  Stephen's testcase is running successfully on 
> >> 2.6.1-mm1 plus your patch -- no more uninitialized data!
> >>
> >
> >Could you please test 2.6.1-mm2 with that patch?  If that works, send the
> >patch back to me?  (I lost it ;))
> >
> >
> I see uninitialized data when I test the patch (attached) on 2.6.1-mm2.  
> Any thoughts on a next step?
> 
> Thanks,
> -Janet
> 
> ______________________________________________________________________
> 
> diff -puN mm/filemap.c~a mm/filemap.c
> --- 25/mm/filemap.c~a   2003-12-31 03:10:29.000000000 -0800
> +++ 25-akpm/mm/filemap.c        2003-12-31 03:17:05.000000000 -0800
> @@ -209,7 +209,13 @@ restart:
> 		page_cache_get(page);
> 		spin_unlock(&mapping->page_lock);
> 
> -		wait_on_page_writeback(page);
> +		lock_page(page);
> +		if (PageDirty(page) && mapping->a_ops->writepage) {
> +			write_one_page(page, 1);
> +		} else {
> +			wait_on_page_writeback(page);
> +			unlock_page(page);
> +		}
> 		if (PageError(page))
> 			ret = -EIO;
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2004-01-11 23:14                               ` Janet Morgan
  2004-01-11 23:44                                 ` Andrew Morton
@ 2004-01-13  4:12                                 ` Janet Morgan
  1 sibling, 0 replies; 58+ messages in thread
From: Janet Morgan @ 2004-01-13  4:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: suparna, daniel, pbadari, linux-aio, linux-kernel

Janet Morgan wrote:

> Andrew Morton wrote:
>
>> Andrew Morton <akpm@osdl.org> wrote:
>>  
>>
>>> Let me actually think about this a bit.
>>>   
>>
>>
>> Nasty.  The same race is present in 2.4.x...
>>
>> How's about we start new I/O in filemap_fdatawait() if the page is 
>> dirty?
>>
>>
>> diff -puN mm/filemap.c~a mm/filemap.c
>> --- 25/mm/filemap.c~a    2003-12-31 03:10:29.000000000 -0800
>> +++ 25-akpm/mm/filemap.c    2003-12-31 03:17:05.000000000 -0800
>> @@ -206,7 +206,13 @@ restart:
>>         page_cache_get(page);
>>         spin_unlock(&mapping->page_lock);
>>
>> -        wait_on_page_writeback(page);
>> +        lock_page(page);
>> +        if (PageDirty(page) && mapping->a_ops->writepage) {
>> +            write_one_page(page, 1);
>> +        } else {
>> +            wait_on_page_writeback(page);
>> +            unlock_page(page);
>> +        }
>>         if (PageError(page))
>>             ret = -EIO;
>>
>>
>>  
>>
> That fixed the problem!  Stephen's testcase is running successfully on 
> 2.6.1-mm1 plus your patch -- no more uninitialized data!
>
Geez, I guess not.  While this was the first time the test ran 
successfully for me, it is not reproducible.  Subsequent runs on 
2.6.1-mm1 plus the above patch are seeing unintialized data.   Sorry for 
the false start.

-Janet



^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 2.6.2-rc3-mm1] DIO read race fix
  2004-01-09  3:55               ` Suparna Bhattacharya
@ 2004-02-05  1:39                 ` Daniel McNeil
  2004-02-05  1:54                   ` Badari Pulavarty
                                     ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Daniel McNeil @ 2004-02-05  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Janet Morgan, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Suparna Bhattacharya

[-- Attachment #1: Type: text/plain, Size: 1093 bytes --]

I have found (finally) the problem causing DIO reads racing with
buffered writes to see uninitialized data on ext3 file systems 
(which is what I have been testing on).

The problem is caused by the changes to __block_write_page_full()
and a race with journaling:

journal_commit_transaction() -> ll_rw_block() -> submit_bh()
	
ll_rw_block() locks the buffer, clears buffer dirty and calls
submit_bh()

A racing __block_write_full_page() (from ext3_ordered_writepage())

	would see that buffer_dirty() is not set because the i/o
        is still in flight, so it would not do a bh_submit()

	It would SetPageWriteback() and unlock_page() and then
	see that no i/o was submitted and call end_page_writeback()
	(with the i/o still in flight).

This would allow the DIO code to issue the DIO read while buffer writes
are still in flight.  The i/o can be reordered by i/o scheduling and
the DIO can complete BEFORE the writebacks complete.  Thus the DIO
sees the old uninitialized data.

Here is a quick hack that fixes it, but I am not sure if this the
proper long term fix.

Thoughts?

Daniel



[-- Attachment #2: 2.6.2-rc3-mm1.DIO-read_race.patch --]
[-- Type: text/plain, Size: 372 bytes --]

--- linux-2.6.2-rc3-mm1/fs/buffer.c	2004-02-04 17:12:43.823525259 -0800
+++ linux-2.6.2-rc3-mm1.patch/fs/buffer.c	2004-02-04 17:16:43.033252068 -0800
@@ -1810,6 +1810,7 @@ static int __block_write_full_page(struc
 
 	do {
 		get_bh(bh);
+		wait_on_buffer(bh);
 		if (buffer_mapped(bh) && buffer_dirty(bh)) {
 			if (wbc->sync_mode != WB_SYNC_NONE) {
 				lock_buffer(bh);

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
  2004-02-05  1:39                 ` [PATCH 2.6.2-rc3-mm1] DIO read race fix Daniel McNeil
@ 2004-02-05  1:54                   ` Badari Pulavarty
  2004-02-05  2:07                   ` Andrew Morton
  2004-02-05  5:33                   ` Andrew Morton
  2 siblings, 0 replies; 58+ messages in thread
From: Badari Pulavarty @ 2004-02-05  1:54 UTC (permalink / raw)
  To: Daniel McNeil, Andrew Morton
  Cc: Janet Morgan, linux-aio, Linux Kernel Mailing List, Suparna Bhattacharya

Hmmm !! You beat me by few hours :)

It sounds convincing. I am not sure about the fix either.
Let me try the fix and see if really fixes the problem.

Thanks,
Badari

On Wednesday 04 February 2004 05:39 pm, Daniel McNeil wrote:
> I have found (finally) the problem causing DIO reads racing with
> buffered writes to see uninitialized data on ext3 file systems
> (which is what I have been testing on).
>
> The problem is caused by the changes to __block_write_page_full()
> and a race with journaling:
>
> journal_commit_transaction() -> ll_rw_block() -> submit_bh()
>
> ll_rw_block() locks the buffer, clears buffer dirty and calls
> submit_bh()
>
> A racing __block_write_full_page() (from ext3_ordered_writepage())
>
> 	would see that buffer_dirty() is not set because the i/o
>         is still in flight, so it would not do a bh_submit()
>
> 	It would SetPageWriteback() and unlock_page() and then
> 	see that no i/o was submitted and call end_page_writeback()
> 	(with the i/o still in flight).
>
> This would allow the DIO code to issue the DIO read while buffer writes
> are still in flight.  The i/o can be reordered by i/o scheduling and
> the DIO can complete BEFORE the writebacks complete.  Thus the DIO
> sees the old uninitialized data.
>
> Here is a quick hack that fixes it, but I am not sure if this the
> proper long term fix.
>
> Thoughts?
>
> Daniel


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
  2004-02-05  1:39                 ` [PATCH 2.6.2-rc3-mm1] DIO read race fix Daniel McNeil
  2004-02-05  1:54                   ` Badari Pulavarty
@ 2004-02-05  2:07                   ` Andrew Morton
  2004-02-05  2:54                     ` Janet Morgan
  2004-02-05  5:33                   ` Andrew Morton
  2 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2004-02-05  2:07 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: janetmor, pbadari, linux-aio, linux-kernel, suparna

Daniel McNeil <daniel@osdl.org> wrote:
>
> I have found (finally) the problem causing DIO reads racing with
> buffered writes to see uninitialized data on ext3 file systems 
> (which is what I have been testing on).

What kernel?  If -mm, is this the only remaining buffered-vs-direct
problem?

> The problem is caused by the changes to __block_write_page_full()
> and a race with journaling:
> 
> journal_commit_transaction() -> ll_rw_block() -> submit_bh()
> 	
> ll_rw_block() locks the buffer, clears buffer dirty and calls
> submit_bh()
> 
> A racing __block_write_full_page() (from ext3_ordered_writepage())
> 
> 	would see that buffer_dirty() is not set because the i/o
>         is still in flight, so it would not do a bh_submit()
> 
> 	It would SetPageWriteback() and unlock_page() and then
> 	see that no i/o was submitted and call end_page_writeback()
> 	(with the i/o still in flight).
> 
> This would allow the DIO code to issue the DIO read while buffer writes
> are still in flight.  The i/o can be reordered by i/o scheduling and
> the DIO can complete BEFORE the writebacks complete.  Thus the DIO
> sees the old uninitialized data.

ow.  How'd you work this out?

> Here is a quick hack that fixes it, but I am not sure if this the
> proper long term fix.

The problem is that ext3 and the VFS are using different paradigms.  VFS is
all page-based, but ext3 is all block-based.  One or the other needs to do
something nasty.

One approach would be to change the JBD write_out_data_locked loop to use
block_write_full_page(bh->b_page) instead of buffer_head operations.  But
that could get hairy with blocksize < PAGE_SIZE.

Thanks for working this out.  Let me ponder it for a bit.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
  2004-02-05  2:07                   ` Andrew Morton
@ 2004-02-05  2:54                     ` Janet Morgan
  2004-02-05  3:19                       ` Andrew Morton
  0 siblings, 1 reply; 58+ messages in thread
From: Janet Morgan @ 2004-02-05  2:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Daniel McNeil, pbadari, linux-aio, linux-kernel, suparna

Andrew Morton wrote:

>Daniel McNeil <daniel@osdl.org> wrote:
>  
>
>>I have found (finally) the problem causing DIO reads racing with
>>buffered writes to see uninitialized data on ext3 file systems 
>>(which is what I have been testing on).
>>    
>>
>
>What kernel?  If -mm, is this the only remaining buffered-vs-direct
>problem?
>
>  
>
I think there was consensus on two other patches along the way:

    http://marc.theaimsgroup.com/?l=linux-kernel&m=107286971311559&w=2
    http://marc.theaimsgroup.com/?l=linux-aio&m=107291089712224&w=2

-Janet

>>The problem is caused by the changes to __block_write_page_full()
>>and a race with journaling:
>>
>>journal_commit_transaction() -> ll_rw_block() -> submit_bh()
>>	
>>ll_rw_block() locks the buffer, clears buffer dirty and calls
>>submit_bh()
>>
>>A racing __block_write_full_page() (from ext3_ordered_writepage())
>>
>>	would see that buffer_dirty() is not set because the i/o
>>        is still in flight, so it would not do a bh_submit()
>>
>>	It would SetPageWriteback() and unlock_page() and then
>>	see that no i/o was submitted and call end_page_writeback()
>>	(with the i/o still in flight).
>>
>>This would allow the DIO code to issue the DIO read while buffer writes
>>are still in flight.  The i/o can be reordered by i/o scheduling and
>>the DIO can complete BEFORE the writebacks complete.  Thus the DIO
>>sees the old uninitialized data.
>>    
>>
>
>ow.  How'd you work this out?
>
>  
>
>>Here is a quick hack that fixes it, but I am not sure if this the
>>proper long term fix.
>>    
>>
>
>The problem is that ext3 and the VFS are using different paradigms.  VFS is
>all page-based, but ext3 is all block-based.  One or the other needs to do
>something nasty.
>
>One approach would be to change the JBD write_out_data_locked loop to use
>block_write_full_page(bh->b_page) instead of buffer_head operations.  But
>that could get hairy with blocksize < PAGE_SIZE.
>
>Thanks for working this out.  Let me ponder it for a bit.
>
>  
>



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
  2004-02-05  2:54                     ` Janet Morgan
@ 2004-02-05  3:19                       ` Andrew Morton
  2004-02-05  3:43                         ` Suparna Bhattacharya
  0 siblings, 1 reply; 58+ messages in thread
From: Andrew Morton @ 2004-02-05  3:19 UTC (permalink / raw)
  To: Janet Morgan; +Cc: daniel, pbadari, linux-aio, linux-kernel, suparna

Janet Morgan <janetmor@us.ibm.com> wrote:
>
> >Daniel McNeil <daniel@osdl.org> wrote:
>  >  
>  >
>  >>I have found (finally) the problem causing DIO reads racing with
>  >>buffered writes to see uninitialized data on ext3 file systems 
>  >>(which is what I have been testing on).
>  >>    
>  >>
>  >
>  >What kernel?  If -mm, is this the only remaining buffered-vs-direct
>  >problem?
>  >
>  >  
>  >
>  I think there was consensus on two other patches along the way:
> 
>      http://marc.theaimsgroup.com/?l=linux-kernel&m=107286971311559&w=2
>      http://marc.theaimsgroup.com/?l=linux-aio&m=107291089712224&w=2

Yes, I think those are needed but this thing has been dragging on for so
long it has become a little unclear.  This was the main reason why I backed
off the fs-aio patches.

Daniel, could you please work out whether we actually need those patches
and if so, prep them for us?  Presumably if ext2 passes all testing without
those patches, we do not need them.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
  2004-02-05  3:19                       ` Andrew Morton
@ 2004-02-05  3:43                         ` Suparna Bhattacharya
  0 siblings, 0 replies; 58+ messages in thread
From: Suparna Bhattacharya @ 2004-02-05  3:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Janet Morgan, daniel, pbadari, linux-aio, linux-kernel

On Wed, Feb 04, 2004 at 07:19:21PM -0800, Andrew Morton wrote:
> Janet Morgan <janetmor@us.ibm.com> wrote:
> >
> > >Daniel McNeil <daniel@osdl.org> wrote:
> >  >  
> >  >
> >  >>I have found (finally) the problem causing DIO reads racing with
> >  >>buffered writes to see uninitialized data on ext3 file systems 
> >  >>(which is what I have been testing on).
> >  >>    
> >  >>
> >  >
> >  >What kernel?  If -mm, is this the only remaining buffered-vs-direct
> >  >problem?
> >  >
> >  >  
> >  >
> >  I think there was consensus on two other patches along the way:
> > 
> >      http://marc.theaimsgroup.com/?l=linux-kernel&m=107286971311559&w=2
> >      http://marc.theaimsgroup.com/?l=linux-aio&m=107291089712224&w=2
> 
> Yes, I think those are needed but this thing has been dragging on for so
> long it has become a little unclear.  This was the main reason why I backed
> off the fs-aio patches.
> 
> Daniel, could you please work out whether we actually need those patches
> and if so, prep them for us?  Presumably if ext2 passes all testing without
> those patches, we do not need them.

I think we agreed from a logical standpoint that those patches are correct
and needed, didn't we ?

Whether or not we encounter those races in our testing is to a certain
extent a matter of chance. I wouldn't use passing of all tests for ext2 
as a proof that they aren't needed. It just tells us that they may be
less likely.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
  2004-02-05  1:39                 ` [PATCH 2.6.2-rc3-mm1] DIO read race fix Daniel McNeil
  2004-02-05  1:54                   ` Badari Pulavarty
  2004-02-05  2:07                   ` Andrew Morton
@ 2004-02-05  5:33                   ` Andrew Morton
  2004-02-05 17:52                     ` Daniel McNeil
  2004-02-05 18:53                     ` Badari Pulavarty
  2 siblings, 2 replies; 58+ messages in thread
From: Andrew Morton @ 2004-02-05  5:33 UTC (permalink / raw)
  To: Daniel McNeil; +Cc: janetmor, pbadari, linux-aio, linux-kernel, suparna

Daniel McNeil <daniel@osdl.org> wrote:
>
>  I have found (finally) the problem causing DIO reads racing with
>  buffered writes to see uninitialized data on ext3 file systems 
>  (which is what I have been testing on).
> 
>  The problem is caused by the changes to __block_write_page_full()
>  and a race with journaling:
> 
>  journal_commit_transaction() -> ll_rw_block() -> submit_bh()
>  	
>  ll_rw_block() locks the buffer, clears buffer dirty and calls
>  submit_bh()
> 
>  A racing __block_write_full_page() (from ext3_ordered_writepage())
> 
>  	would see that buffer_dirty() is not set because the i/o
>          is still in flight, so it would not do a bh_submit()
> 
>  	It would SetPageWriteback() and unlock_page() and then
>  	see that no i/o was submitted and call end_page_writeback()
>  	(with the i/o still in flight).
> 
>  This would allow the DIO code to issue the DIO read while buffer writes
>  are still in flight.  The i/o can be reordered by i/o scheduling and
>  the DIO can complete BEFORE the writebacks complete.  Thus the DIO
>  sees the old uninitialized data.

I suppose we should go for a general fix to the problem.  I'm not 100%
happy with it.  It's similar to yours, except we only wait if
wbc->sync_mode says it's a write-for-sync.  Also we hold the buffer lock
across all the tests.






Fix a race which was identified by Daniel McNeil <daniel@osdl.org>

If a buffer_head is under I/O due to JBD's ordered data writeout (which uses
ll_rw_block()) then either filemap_fdatawrite() or filemap_fdatawait() need
to wait on the buffer's existing I/O.

Presently neither will do so, because __block_write_full_page() will not
actually submit any I/O and will hence not mark the page as being under
writeback.

The best-performing fix would be to somehow mark the page as being under
writeback and defer waiting for the ll_rw_block-initiated I/O until
filemap_fdatawait()-time.  But this is hard, because in
__block_write_full_page() we do not have control of the buffer_head's end_io
handler.  Possibly we could make JBD call into end_buffer_async_write(), but
that gets nasty.

This patch makes __block_write_full_page() wait for any buffer_head I/O to
complete before inspecting the buffer_head state.  It only does this in the
case where __block_write_full_page() was called for a "data-integrity" write:
(wbc->sync_mode != WB_SYNC_NONE).

Probably it doesn't matter, because kjournald is currently submitting (or has
already submitted) all dirty buffers anyway.



---

 fs/buffer.c |   29 +++++++++++++++--------------
 1 files changed, 15 insertions(+), 14 deletions(-)

diff -puN fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix fs/buffer.c
--- 25/fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix	2004-02-04 20:38:30.000000000 -0800
+++ 25-akpm/fs/buffer.c	2004-02-04 20:40:19.000000000 -0800
@@ -1810,23 +1810,24 @@ static int __block_write_full_page(struc
 
 	do {
 		get_bh(bh);
-		if (buffer_mapped(bh) && buffer_dirty(bh)) {
-			if (wbc->sync_mode != WB_SYNC_NONE) {
-				lock_buffer(bh);
-			} else {
-				if (test_set_buffer_locked(bh)) {
+		if (!buffer_mapped(bh))
+			continue;
+		if (wbc->sync_mode != WB_SYNC_NONE) {
+			lock_buffer(bh);
+		} else {
+			if (test_set_buffer_locked(bh)) {
+				if (buffer_dirty(bh))
 					__set_page_dirty_nobuffers(page);
-					continue;
-				}
-			}
-			if (test_clear_buffer_dirty(bh)) {
-				if (!buffer_uptodate(bh))
-					buffer_error();
-				mark_buffer_async_write(bh);
-			} else {
-				unlock_buffer(bh);
+				continue;
 			}
 		}
+		if (test_clear_buffer_dirty(bh)) {
+			if (!buffer_uptodate(bh))
+				buffer_error();
+			mark_buffer_async_write(bh);
+		} else {
+			unlock_buffer(bh);
+		}
 	} while ((bh = bh->b_this_page) != head);
 
 	BUG_ON(PageWriteback(page));

_


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
  2004-02-05  5:33                   ` Andrew Morton
@ 2004-02-05 17:52                     ` Daniel McNeil
  2004-02-05 18:53                     ` Badari Pulavarty
  1 sibling, 0 replies; 58+ messages in thread
From: Daniel McNeil @ 2004-02-05 17:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: janetmor, Badari Pulavarty, linux-aio, Linux Kernel Mailing List,
	Suparna Bhattacharya

Andrew,

I am still thinking about your patch.  I will run some tests today using
2.6.2-mm1 to see if the problem is fixed.  My 8-proc machine ran
overnight with 6 copies of the read_under running without problems with
my original patch.  Previously on the 8-proc machine, it would hit
uninitialized data within an hour.

The concern I have is that DIO needs filemap_write_and_wait() to
make sure all previously dirty pages have been written back to
disk before the DIO is issued.

If __block_write_full_page() can possibly clear PageWriteback 
with buffer i/o still in flight (even for WB_SYNC_NONE) then
a subsequent filemap_write_and_wait() will miss that page.

For example, I previously tried:

        do {
                get_bh(bh);
+		if (wbc->sync_mode != WB_SYCN_NONE)
+              		 wait_on_buffer(bh);
                if (buffer_mapped(bh) && buffer_dirty(bh)) {
                        if (wbc->sync_mode != WB_SYNC_NONE) {
                                lock_buffer(bh);

and this still saw uninitialized data.

Also, if __block_write_full_page() can redirty a page wouldn't this
allow filemap_write_and_wait() to return with page still dirty that
DIO needs written back?
	 
I'll work on updating the other patches.

Daniel



On Wed, 2004-02-04 at 21:33, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> >  I have found (finally) the problem causing DIO reads racing with
> >  buffered writes to see uninitialized data on ext3 file systems 
> >  (which is what I have been testing on).
> > 
> >  The problem is caused by the changes to __block_write_page_full()
> >  and a race with journaling:
> > 
> >  journal_commit_transaction() -> ll_rw_block() -> submit_bh()
> >  	
> >  ll_rw_block() locks the buffer, clears buffer dirty and calls
> >  submit_bh()
> > 
> >  A racing __block_write_full_page() (from ext3_ordered_writepage())
> > 
> >  	would see that buffer_dirty() is not set because the i/o
> >          is still in flight, so it would not do a bh_submit()
> > 
> >  	It would SetPageWriteback() and unlock_page() and then
> >  	see that no i/o was submitted and call end_page_writeback()
> >  	(with the i/o still in flight).
> > 
> >  This would allow the DIO code to issue the DIO read while buffer writes
> >  are still in flight.  The i/o can be reordered by i/o scheduling and
> >  the DIO can complete BEFORE the writebacks complete.  Thus the DIO
> >  sees the old uninitialized data.
> 
> I suppose we should go for a general fix to the problem.  I'm not 100%
> happy with it.  It's similar to yours, except we only wait if
> wbc->sync_mode says it's a write-for-sync.  Also we hold the buffer lock
> across all the tests.
> 
> 
> 
> 
> 
> 
> Fix a race which was identified by Daniel McNeil <daniel@osdl.org>
> 
> If a buffer_head is under I/O due to JBD's ordered data writeout (which uses
> ll_rw_block()) then either filemap_fdatawrite() or filemap_fdatawait() need
> to wait on the buffer's existing I/O.
> 
> Presently neither will do so, because __block_write_full_page() will not
> actually submit any I/O and will hence not mark the page as being under
> writeback.
> 
> The best-performing fix would be to somehow mark the page as being under
> writeback and defer waiting for the ll_rw_block-initiated I/O until
> filemap_fdatawait()-time.  But this is hard, because in
> __block_write_full_page() we do not have control of the buffer_head's end_io
> handler.  Possibly we could make JBD call into end_buffer_async_write(), but
> that gets nasty.
> 
> This patch makes __block_write_full_page() wait for any buffer_head I/O to
> complete before inspecting the buffer_head state.  It only does this in the
> case where __block_write_full_page() was called for a "data-integrity" write:
> (wbc->sync_mode != WB_SYNC_NONE).
> 
> Probably it doesn't matter, because kjournald is currently submitting (or has
> already submitted) all dirty buffers anyway.
> 
> 
> 
> ---
> 
>  fs/buffer.c |   29 +++++++++++++++--------------
>  1 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff -puN fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix fs/buffer.c
> --- 25/fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix	2004-02-04 20:38:30.000000000 -0800
> +++ 25-akpm/fs/buffer.c	2004-02-04 20:40:19.000000000 -0800
> @@ -1810,23 +1810,24 @@ static int __block_write_full_page(struc
>  
>  	do {
>  		get_bh(bh);
> -		if (buffer_mapped(bh) && buffer_dirty(bh)) {
> -			if (wbc->sync_mode != WB_SYNC_NONE) {
> -				lock_buffer(bh);
> -			} else {
> -				if (test_set_buffer_locked(bh)) {
> +		if (!buffer_mapped(bh))
> +			continue;
> +		if (wbc->sync_mode != WB_SYNC_NONE) {
> +			lock_buffer(bh);
> +		} else {
> +			if (test_set_buffer_locked(bh)) {
> +				if (buffer_dirty(bh))
>  					__set_page_dirty_nobuffers(page);
> -					continue;
> -				}
> -			}
> -			if (test_clear_buffer_dirty(bh)) {
> -				if (!buffer_uptodate(bh))
> -					buffer_error();
> -				mark_buffer_async_write(bh);
> -			} else {
> -				unlock_buffer(bh);
> +				continue;
>  			}
>  		}
> +		if (test_clear_buffer_dirty(bh)) {
> +			if (!buffer_uptodate(bh))
> +				buffer_error();
> +			mark_buffer_async_write(bh);
> +		} else {
> +			unlock_buffer(bh);
> +		}
>  	} while ((bh = bh->b_this_page) != head);
>  
>  	BUG_ON(PageWriteback(page));
> 
> _


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 2.6.2-rc3-mm1] DIO read race fix
  2004-02-05  5:33                   ` Andrew Morton
  2004-02-05 17:52                     ` Daniel McNeil
@ 2004-02-05 18:53                     ` Badari Pulavarty
  1 sibling, 0 replies; 58+ messages in thread
From: Badari Pulavarty @ 2004-02-05 18:53 UTC (permalink / raw)
  To: Andrew Morton, Daniel McNeil; +Cc: janetmor, linux-aio, linux-kernel, suparna

Andrew,

I see uninitialized data  with this patch :(
I patched it manullay, so I might have missed something.
I will double check and let you know, if I patched incorrectly.


FYI.

Thanks,
Badari

On Wednesday 04 February 2004 09:33 pm, Andrew Morton wrote:

> Fix a race which was identified by Daniel McNeil <daniel@osdl.org>
>
> If a buffer_head is under I/O due to JBD's ordered data writeout (which
> uses ll_rw_block()) then either filemap_fdatawrite() or filemap_fdatawait()
> need to wait on the buffer's existing I/O.
>
> Presently neither will do so, because __block_write_full_page() will not
> actually submit any I/O and will hence not mark the page as being under
> writeback.
>
> The best-performing fix would be to somehow mark the page as being under
> writeback and defer waiting for the ll_rw_block-initiated I/O until
> filemap_fdatawait()-time.  But this is hard, because in
> __block_write_full_page() we do not have control of the buffer_head's
> end_io handler.  Possibly we could make JBD call into
> end_buffer_async_write(), but that gets nasty.
>
> This patch makes __block_write_full_page() wait for any buffer_head I/O to
> complete before inspecting the buffer_head state.  It only does this in the
> case where __block_write_full_page() was called for a "data-integrity"
> write: (wbc->sync_mode != WB_SYNC_NONE).
>
> Probably it doesn't matter, because kjournald is currently submitting (or
> has already submitted) all dirty buffers anyway.
>
>
>
> ---
>
>  fs/buffer.c |   29 +++++++++++++++--------------
>  1 files changed, 15 insertions(+), 14 deletions(-)
>
> diff -puN fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix
> fs/buffer.c ---
> 25/fs/buffer.c~O_DIRECT-ll_rw_block-vs-block_write_full_page-fix	2004-02-04
> 20:38:30.000000000 -0800 +++ 25-akpm/fs/buffer.c	2004-02-04
> 20:40:19.000000000 -0800
> @@ -1810,23 +1810,24 @@ static int __block_write_full_page(struc
>
>  	do {
>  		get_bh(bh);
> -		if (buffer_mapped(bh) && buffer_dirty(bh)) {
> -			if (wbc->sync_mode != WB_SYNC_NONE) {
> -				lock_buffer(bh);
> -			} else {
> -				if (test_set_buffer_locked(bh)) {
> +		if (!buffer_mapped(bh))
> +			continue;
> +		if (wbc->sync_mode != WB_SYNC_NONE) {
> +			lock_buffer(bh);
> +		} else {
> +			if (test_set_buffer_locked(bh)) {
> +				if (buffer_dirty(bh))
>  					__set_page_dirty_nobuffers(page);
> -					continue;
> -				}
> -			}
> -			if (test_clear_buffer_dirty(bh)) {
> -				if (!buffer_uptodate(bh))
> -					buffer_error();
> -				mark_buffer_async_write(bh);
> -			} else {
> -				unlock_buffer(bh);
> +				continue;
>  			}
>  		}
> +		if (test_clear_buffer_dirty(bh)) {
> +			if (!buffer_uptodate(bh))
> +				buffer_error();
> +			mark_buffer_async_write(bh);
> +		} else {
> +			unlock_buffer(bh);
> +		}
>  	} while ((bh = bh->b_this_page) != head);
>
>  	BUG_ON(PageWriteback(page));
>
> _


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
  2003-12-31  0:29         ` Daniel McNeil
  2003-12-31  6:09           ` Suparna Bhattacharya
@ 2004-03-29 15:41           ` Suparna Bhattacharya
  1 sibling, 0 replies; 58+ messages in thread
From: Suparna Bhattacharya @ 2004-03-29 15:41 UTC (permalink / raw)
  To: Administrator
  Cc: Janet Morgan, Badari Pulavarty, linux-aio,
	Linux Kernel Mailing List, Andrew Morton

On Tue, Dec 30, 2003 at 04:29:17PM -0800, Daniel McNeil wrote:
> On Mon, 2003-12-29 at 20:53, Suparna Bhattacharya wrote:
> > On Thu, Dec 11, 2003 at 04:51:33PM -0800, Daniel McNeil wrote:
> > > I've done more testing with added debug code.
> > > 
> > > It looks like the filemap_write_and_wait() call is returning
> > > with data that has not made it disk.
> > > 
> > > I added code to filemap_write_and_wait() to check if
> > > mapping->dirty_pages is not empty after calling filemap_fdatawrite()
> > > and filemap_fdatawait() and retry.  Even with the retry, the test still
> > > sees uninitialized data when running tests overnight (I added a printk
> > > so I know the retry is happening).  There are pages left on the
> > > dirty_pages list even after the write and wait.   
> > 
> > There are two calls to filemap_write_and_wait() for a DIO read
> > -- do you know in which if these cases you see dirty pages after
> > the write and wait ? The first is called without i_sem protection,
> > so it is possible for pages to be dirtied by a parallel buffered
> > write which is OK because this is just an extra/superfluous call 
> > when it comes to DIO reads. The second call, however happens with i_sem 
> > held and is used to guarantee that there are no exposures, so if 
> > you are seeing remant dirty pages in this case it would be something 
> > to worry about.
> > 
> 
> Yes there are two calls.  I'm assuming that the retry was caused by the
> call w/o holding the i_sem, since pages can be dirtied in parallel.
> 
> My debug output shows every filemap_write_and_wait() and which pages are
> on the dirty, io, and locked list and which pages are on the clean list
> after the wait. (yes, this is lots of output)
> 
> I changed the test a little bit -- if the test sees non-zero data, it
> opens a new file descriptor and re-reads the data without O_DIRECT and
> also re-reads the 1st page of the non-zero data using O_DIRECT.
> 
> What the debug output shows is:
> 
> The 1st filemap_write_and_wait() writes out some data.
> 
> The 2nd filemap_write_and_wait() sees different dirty pages.  The pages
>   that are seen non-zero by the test (many pages -- 243 in one case) are
>   on the dirty_pages list before the write and on the clean_pages list
>   after the wait.  But some of the pages at the end of the clean list,
>   are seen by the test program as non-zero.
> 
> Since I added the DIO re-read, the debug output shows for the 2nd
>   re-read:
> 
> the 1st filemap_write_and_wait() with NO dirty pages
> the 2nd filemap_write_and_wait() with dirty pages AFTER the pages
>   that mis-matched and the original plus these pages on the clean
>   list after the wait.
> 
> The 2nd re-read and the read without O_DIRECT both get zeroed data.
> 
> Conclusions:
> ===========
> 
> I'm not sure. :(
> 
> It appears like the pages are put on the clean list and PG_writeback is
> cleared while the i/o is in flight and the DIO reads are put on the
> queue ahead of the writeback.
> 
> I am trying to figure out how to add more debug code to prove this
> or figure out what else is going on.
> 
> I'm open to suggestions.

Since the first filemap_write_and_wait call is redundant and somewhat
suspect since its called w/o i_sem (I can think of unexpected side effects
on parallel filemap_write_and_wait calls), have you thought of disabling that
and then trying to see if you can still recreate the problem ? It may
not make a difference, but it seems like the right thing to do and could
at least simplify some of the debugging.

Regards
Suparna

> 
> Daniel
> 

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch
  2004-01-02  5:50                               ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
  2004-01-02  7:31                                 ` Andrew Morton
  2004-01-05 13:49                                 ` Marcelo Tosatti
@ 2004-03-29 15:44                                 ` Marcelo Tosatti
  2 siblings, 0 replies; 58+ messages in thread
From: Marcelo Tosatti @ 2004-03-29 15:44 UTC (permalink / raw)
  To: Administrator
  Cc: Andrew Morton, daniel, janetmor, pbadari, linux-aio, linux-kernel



On Fri, 2 Jan 2004, Suparna Bhattacharya wrote:

> On Wed, Dec 31, 2003 at 03:17:36AM -0800, Andrew Morton wrote:
> > Andrew Morton <akpm@osdl.org> wrote:
> > >
> > > Let me actually think about this a bit.
> >
> > Nasty.  The same race is present in 2.4.x...

filemap_fdatawait() is always called with i_sem held and
there is no "!PG_dirty and !PG_writeback" window.

Where does the race lies in 2.4 ?

Daniel, Would be interesting to know if the direct IO tests also fail on
2.4.

> > How's about we start new I/O in filemap_fdatawait() if the page is
> > dirty?

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2004-03-29 15:44 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3FCD4B66.8090905@us.ibm.com>
2003-12-06  1:29 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Daniel McNeil
2003-12-08 18:23   ` Daniel McNeil
2003-12-12  0:51     ` Daniel McNeil
2003-12-17  1:25       ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-17  2:03         ` Andrew Morton
2003-12-17 19:25           ` Daniel McNeil
2003-12-17 20:17             ` Janet Morgan
2003-12-31  9:18           ` Suparna Bhattacharya
2003-12-31  9:35             ` Andrew Morton
2003-12-31  9:55               ` Suparna Bhattacharya
2003-12-31  9:59                 ` Andrew Morton
2003-12-31 10:09                   ` Suparna Bhattacharya
2003-12-31 10:10                     ` Andrew Morton
2003-12-31 10:48                       ` Suparna Bhattacharya
2003-12-31 10:53                         ` Andrew Morton
2003-12-31 10:54                           ` Andrew Morton
2003-12-31 11:17                             ` Andrew Morton
2003-12-31 22:34                               ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-31 22:41                                 ` [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch Daniel McNeil
2003-12-31 23:46                                   ` Andrew Morton
2004-01-02  5:14                                     ` Suparna Bhattacharya
2004-01-02  7:46                                       ` Andrew Morton
2004-01-05  3:55                                         ` Suparna Bhattacharya
2004-01-05  5:06                                           ` Andrew Morton
2004-01-05  5:28                                             ` Suparna Bhattacharya
2004-01-05  5:28                                               ` Andrew Morton
2004-01-05  6:06                                                 ` Suparna Bhattacharya
2004-01-05  6:14                                                 ` Lincoln Dale
2003-12-31 22:47                                 ` [PATCH linux-2.6.1-rc1-mm1] dio_isize.patch Daniel McNeil
2003-12-31 23:42                                 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Andrew Morton
2004-01-02  4:20                                   ` Suparna Bhattacharya
2004-01-02  4:36                                     ` Andrew Morton
2004-01-02  5:50                               ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
2004-01-02  7:31                                 ` Andrew Morton
2004-01-05 13:49                                 ` Marcelo Tosatti
2004-01-05 20:27                                   ` Andrew Morton
2004-03-29 15:44                                 ` Marcelo Tosatti
2004-01-11 23:14                               ` Janet Morgan
2004-01-11 23:44                                 ` Andrew Morton
2004-01-12 18:00                                   ` filemap_fdatawait.patch Daniel McNeil
2004-01-12 19:39                                   ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Janet Morgan
2004-01-12 19:46                                     ` Daniel McNeil
2004-01-13  4:12                                 ` Janet Morgan
2003-12-30  4:53       ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
2003-12-31  0:29         ` Daniel McNeil
2003-12-31  6:09           ` Suparna Bhattacharya
2004-01-08 23:55             ` Daniel McNeil
2004-01-09  3:55               ` Suparna Bhattacharya
2004-02-05  1:39                 ` [PATCH 2.6.2-rc3-mm1] DIO read race fix Daniel McNeil
2004-02-05  1:54                   ` Badari Pulavarty
2004-02-05  2:07                   ` Andrew Morton
2004-02-05  2:54                     ` Janet Morgan
2004-02-05  3:19                       ` Andrew Morton
2004-02-05  3:43                         ` Suparna Bhattacharya
2004-02-05  5:33                   ` Andrew Morton
2004-02-05 17:52                     ` Daniel McNeil
2004-02-05 18:53                     ` Badari Pulavarty
2004-03-29 15:41           ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).