From: Daniel McNeil <daniel@osdl.org>
To: Janet Morgan <janetmor@us.ibm.com>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>,
Badari Pulavarty <pbadari@us.ibm.com>,
"linux-aio@kvack.org" <linux-aio@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>
Subject: Re: [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix
Date: 11 Dec 2003 16:51:33 -0800 [thread overview]
Message-ID: <1071190292.1937.13.camel@ibm-c.pdx.osdl.net> (raw)
In-Reply-To: <1070907814.707.2.camel@ibm-c.pdx.osdl.net>
I've done more testing with added debug code.
It looks like the filemap_write_and_wait() call is returning
with data that has not made it disk.
I added code to filemap_write_and_wait() to check if
mapping->dirty_pages is not empty after calling filemap_fdatawrite()
and filemap_fdatawait() and retry. Even with the retry, the test still
sees uninitialized data when running tests overnight (I added a printk
so I know the retry is happening). There are pages left on the
dirty_pages list even after the write and wait.
I've added a bunch more debug code and am currently running the test
again to see if I can find out what is going on.
I'll send more email when I know more.
Daniel
On Mon, 2003-12-08 at 10:23, Daniel McNeil wrote:
> My patch did not fix the problem. :(
>
> The tests over the weekend reported bad data.
>
> The sampling of the i_size after dropping i_sem still looks wrong
> to me. I will keep looking and see if I can find another problem.
>
>
> Daniel
>
> On Fri, 2003-12-05 at 17:29, Daniel McNeil wrote:
> > Janet,
> >
> > I think I found the problem that is causing the dio reads to get
> > unintialized data with buffer writes. i_size is sampled after
> > i_sem is dropped and after the i/o have completed. i_size could
> > have changed in the mean time. This patch samples i_size before
> > dropping i_sem. I'll leave the test running over the weekend to
> > verify.
> >
> > Daniel
> >
> > On Tue, 2003-12-02 at 18:33, Janet Morgan wrote:
> > > Hi Suparna, Daniel,
> > >
> > > Just wanted to let you know that I seem to be hitting a failure when I
> > > run the testcase (attached) that Stephen Tweedie wrote to expose the DIO
> > > races (the testcase issues buffered writes and dio reads). I'm trying to
> > > debug the failure, but I know it will be slow going for me.
> > >
> > > I hit the problem when I run multiple instances of the test against
> > > the same filesystem (I do specify a unique file for each instance of
> > > the test). I normally run about 6 instances of the test, e.g.,
> > > "direct_read_under foo1", "direct_read_under foo2", etc. The test
> > > runs in an infinite loop until a data mis-compare is detected.
> > > I have not been able to reproduce the failure when I restrict each
> > > instance of the test to its own filesystem.
> > >
> > > I've tried running with the combinations below and I eventually
> > > get a failure in each case. I assume all but the last combo includes
> > > the critical aio/dio fixes for the associated base.
> > >
> > > Combinations tested:
> > >
> > > Daniel's latest:
> > > 2.6.0-test9-mm5 +
> > > aio-dio-fallback-bio_count-race.patch +
> > > direct-io-memleak-fix.patch
> > >
> > > Suparna's latest:
> > > 2.6.0-test9-mm5 +
> > > Suparna's current patches:
> > >
> > > http://marc.theaimsgroup.com/?l=linux-aio&m=106983304420570&w=2 +
> > >
> > > http://marc.theaimsgroup.com/?l=linux-kernel&m=106904658121299&w=2 +
> > > direct-io-memleak-fix.patch
> > >
> > > stock linux-2.6.0-test9 plus:
> > > aio-refcnt.patch +
> > > dio-aio-fixes.patch +
> > > dio-aio-fixes-fixes.patch +
> > > O_DIRECT-race-fixes-rollup.patch
> > >
> > > stock linux-2.6.0-test11 plus:
> > > O_DIRECT-race-fixes-rollup.patch
> > >
> > > Thanks,
> > > -Janet
> > >
> > > ______________________________________________________________________
> > >
> > > #define _XOPEN_SOURCE 600
> > > #define _GNU_SOURCE
> > >
> > > #include <unistd.h>
> > > #include <stdlib.h>
> > > #include <stdio.h>
> > > #include <string.h>
> > > #include <errno.h>
> > > #include <sys/fcntl.h>
> > > #include <sys/mman.h>
> > > #include <sys/wait.h>
> > >
> > > #define BIGSIZE 128*1024*1024
> > > #define READSIZE 32*1024*1024
> > > #define WRITESIZE 32*1024*1024
> > >
> > > int pagesize;
> > > char *iobuf;
> > >
> > > void assert(const char *what, int assertion)
> > > {
> > > if (assertion)
> > > return;
> > > perror(what);
> > > exit(1);
> > > }
> > >
> > > void do_buffered_writes(int fd, int pattern)
> > > {
> > > int rc;
> > > int offset;
> > >
> > > memset(iobuf, pattern, WRITESIZE);
> > > for (offset = 0; offset+WRITESIZE <= BIGSIZE; offset += WRITESIZE) {
> > > rc = pwrite(fd, iobuf, WRITESIZE, offset);
> > > assert("pwrite", rc >= 0);
> > > if (rc != WRITESIZE) {
> > > fprintf(stderr, "short write (%d out of %d)\n",
> > > rc, WRITESIZE);
> > > exit(1);
> > > }
> > > fsync(fd);
> > > }
> > > }
> > >
> > > int do_direct_reads(char *filename)
> > > {
> > > int fd;
> > > int offset;
> > > int rc, i;
> > > int *p;
> > >
> > > fd = open(filename, O_DIRECT|O_RDONLY, 0);
> > > assert("open", fd >= 0);
> > >
> > > for (offset = 0; offset+READSIZE <= BIGSIZE; offset += READSIZE) {
> > > rc = pread(fd, iobuf, READSIZE, offset);
> > > assert("pread", rc >= 0);
> > > if (rc != READSIZE) {
> > > fprintf(stderr, "short read (%d out of %d)\n",
> > > rc, READSIZE);
> > > exit(1);
> > > }
> > > for (i=0, p=(int *)iobuf; i<READSIZE; i+=4) {
> > > if (*p) {
> > > fprintf(stderr,
> > > "Found data (%08x) at offset %d+%d\n",
> > > *p, offset, i);
> > > return 1;
> > > }
> > > p++;
> > > }
> > > }
> > > return 0;
> > > }
> > >
> > > int main(int argc, char *argv[])
> > > {
> > > char *filename;
> > > int fd;
> > > int pid;
> > > int err;
> > > int pass = 0;
> > > int bufsize;
> > >
> > > if (argc != 2) {
> > > fprintf(stderr, "Needs a filename as an argument.\n");
> > > exit(1);
> > > }
> > >
> > > filename = argv[1];
> > >
> > > pagesize = getpagesize();
> > > bufsize = READSIZE;
> > > if (WRITESIZE > READSIZE)
> > > bufsize = WRITESIZE;
> > > err = posix_memalign((void**) &iobuf, pagesize, bufsize);
> > > if (err) {
> > > fprintf(stderr, "Error allocating %d aligned bytes.\n", bufsize);
> > > exit(1);
> > > }
> > >
> > > fd = open(filename, O_CREAT|O_TRUNC|O_RDWR, 0666);
> > > assert("open", fd >= 0);
> > >
> > > do {
> > > printf("Pass %d...\n", ++pass);
> > >
> > > assert("ftruncate", ftruncate(fd, BIGSIZE) == 0);
> > > fsync(fd);
> > >
> > > pid = fork();
> > > assert("fork", pid >= 0);
> > >
> > > if (!pid) {
> > > do_buffered_writes(fd, 0);
> > > exit(0);
> > > }
> > >
> > > err = do_direct_reads(filename);
> > >
> > > wait4(pid, NULL, WNOHANG, 0);
> > >
> > > if (err)
> > > break;
> > >
> > > /* Fill the file with a known pattern so that the blocks
> > > * on disk can be detected if they become exposed. */
> > > do_buffered_writes(fd, 1);
> > > fsync(fd);
> > >
> > > assert("ftruncate", ftruncate(fd, 0) == 0);
> > > fsync(fd);
> > > } while (1);
> > >
> > > return err;
> > > }
> > >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2003-12-12 0:52 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <3FCD4B66.8090905@us.ibm.com>
2003-12-06 1:29 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Daniel McNeil
2003-12-08 18:23 ` Daniel McNeil
2003-12-12 0:51 ` Daniel McNeil [this message]
2003-12-17 1:25 ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-17 2:03 ` Andrew Morton
2003-12-17 19:25 ` Daniel McNeil
2003-12-17 20:17 ` Janet Morgan
2003-12-31 9:18 ` Suparna Bhattacharya
2003-12-31 9:35 ` Andrew Morton
2003-12-31 9:55 ` Suparna Bhattacharya
2003-12-31 9:59 ` Andrew Morton
2003-12-31 10:09 ` Suparna Bhattacharya
2003-12-31 10:10 ` Andrew Morton
2003-12-31 10:48 ` Suparna Bhattacharya
2003-12-31 10:53 ` Andrew Morton
2003-12-31 10:54 ` Andrew Morton
2003-12-31 11:17 ` Andrew Morton
2003-12-31 22:34 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Daniel McNeil
2003-12-31 22:41 ` [PATCH linux-2.6.1-rc1-mm1] aiodio_fallback_bio_count.patch Daniel McNeil
2003-12-31 23:46 ` Andrew Morton
2004-01-02 5:14 ` Suparna Bhattacharya
2004-01-02 7:46 ` Andrew Morton
2004-01-05 3:55 ` Suparna Bhattacharya
2004-01-05 5:06 ` Andrew Morton
2004-01-05 5:28 ` Suparna Bhattacharya
2004-01-05 5:28 ` Andrew Morton
2004-01-05 6:06 ` Suparna Bhattacharya
2004-01-05 6:14 ` Lincoln Dale
2003-12-31 22:47 ` [PATCH linux-2.6.1-rc1-mm1] dio_isize.patch Daniel McNeil
2003-12-31 23:42 ` [PATCH linux-2.6.1-rc1-mm1] filemap_fdatawait.patch Andrew Morton
2004-01-02 4:20 ` Suparna Bhattacharya
2004-01-02 4:36 ` Andrew Morton
2004-01-02 5:50 ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Suparna Bhattacharya
2004-01-02 7:31 ` Andrew Morton
2004-01-05 13:49 ` Marcelo Tosatti
2004-01-05 20:27 ` Andrew Morton
2004-03-29 15:44 ` Marcelo Tosatti
2004-01-11 23:14 ` Janet Morgan
2004-01-11 23:44 ` Andrew Morton
2004-01-12 18:00 ` filemap_fdatawait.patch Daniel McNeil
2004-01-12 19:39 ` [PATCH linux-2.6.0-test10-mm1] filemap_fdatawait.patch Janet Morgan
2004-01-12 19:46 ` Daniel McNeil
2004-01-13 4:12 ` Janet Morgan
2003-12-30 4:53 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
2003-12-31 0:29 ` Daniel McNeil
2003-12-31 6:09 ` Suparna Bhattacharya
2004-01-08 23:55 ` Daniel McNeil
2004-01-09 3:55 ` Suparna Bhattacharya
2004-02-05 1:39 ` [PATCH 2.6.2-rc3-mm1] DIO read race fix Daniel McNeil
2004-02-05 1:54 ` Badari Pulavarty
2004-02-05 2:07 ` Andrew Morton
2004-02-05 2:54 ` Janet Morgan
2004-02-05 3:19 ` Andrew Morton
2004-02-05 3:43 ` Suparna Bhattacharya
2004-02-05 5:33 ` Andrew Morton
2004-02-05 17:52 ` Daniel McNeil
2004-02-05 18:53 ` Badari Pulavarty
2004-03-29 15:41 ` [PATCH linux-2.6.0-test10-mm1] dio-read-race-fix Suparna Bhattacharya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1071190292.1937.13.camel@ibm-c.pdx.osdl.net \
--to=daniel@osdl.org \
--cc=akpm@osdl.org \
--cc=janetmor@us.ibm.com \
--cc=linux-aio@kvack.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbadari@us.ibm.com \
--cc=suparna@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).