linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5 O)DIRECT problem
@ 2002-10-04 20:17 Steve Lord
  2002-10-04 20:29 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Steve Lord @ 2002-10-04 20:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel

Hi Andrew,

I ran into a problem with 2.5's O_DIRECT read path,

	generic_file_direct_IO usually ends up in generic_direct_IO
	this does bounds checking on the I/O and then flushes any
	cached data.

	Once we return to generic_file_direct_IO we unconditionally
	call invalidate_inode_pages2 if there are any cached pages.

If we make a non-aligned O_DIRECT read call, the end result is we
call invalidate_inode_pages2, but we do not do the filemap_fdatawrite,
filemap_fdatawait calls. End result is we throw the buffered data away.

Either the flush needs to happen before the bounds checks, or the
invalidate should only be done on a successful write. It looks 
pretty hard to detect the latter case with the current structure,
we can get EINVAL from the bounds check and possibly from an
aligned, but invalid memory address being passed in.

This is worse for xfs than for other filesystems since if we do the
invalidate without the flush first, we end up with a delayed allocate
extent in memory and no data to flush into it. Various spots in the
filesystem dislike this.

I can fix it inside xfs by doing my own flush and invalidate first, but
the generic code should really be fixed.

Thoughts?

Steve

-- 

Steve Lord                                      voice: +1-651-683-3511
Principal Engineer, Filesystem Software         email: lord@sgi.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.5 O)DIRECT problem
  2002-10-04 20:17 2.5 O)DIRECT problem Steve Lord
@ 2002-10-04 20:29 ` Andrew Morton
  2002-10-04 20:38   ` Steve Lord
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2002-10-04 20:29 UTC (permalink / raw)
  To: Steve Lord; +Cc: Linux Kernel

Steve Lord wrote:
> 
> Hi Andrew,
> 
> I ran into a problem with 2.5's O_DIRECT read path,
> 
>         generic_file_direct_IO usually ends up in generic_direct_IO
>         this does bounds checking on the I/O and then flushes any
>         cached data.
> 
>         Once we return to generic_file_direct_IO we unconditionally
>         call invalidate_inode_pages2 if there are any cached pages.
> 
> If we make a non-aligned O_DIRECT read call, the end result is we
> call invalidate_inode_pages2, but we do not do the filemap_fdatawrite,
> filemap_fdatawait calls. End result is we throw the buffered data away.

Well you could always switch to Linus' current BK tree, in
which invalidate_inode_pages2() is a no-op (whoops).

> Either the flush needs to happen before the bounds checks, or the
> invalidate should only be done on a successful write. It looks
> pretty hard to detect the latter case with the current structure,
> we can get EINVAL from the bounds check and possibly from an
> aligned, but invalid memory address being passed in.

Yes I agree; let's just do the sync before any checks.

I think it should be moved into generic_file_direct_IO(),
because that's the place where the invalidation happens, yes?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.5 O)DIRECT problem
  2002-10-04 20:29 ` Andrew Morton
@ 2002-10-04 20:38   ` Steve Lord
  2002-10-04 20:51     ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Steve Lord @ 2002-10-04 20:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel

On Fri, 2002-10-04 at 15:29, Andrew Morton wrote:
> Steve Lord wrote:
> > Either the flush needs to happen before the bounds checks, or the
> > invalidate should only be done on a successful write. It looks
> > pretty hard to detect the latter case with the current structure,
> > we can get EINVAL from the bounds check and possibly from an
> > aligned, but invalid memory address being passed in.
> 
> Yes I agree; let's just do the sync before any checks.
> 
> I think it should be moved into generic_file_direct_IO(),
> because that's the place where the invalidation happens, yes?

OK, sounds good to me, I will let my tests churn away on that
version and see what happens. I think something else is doing
the same thing to me elsewhere, but it might well be an xfs
specific case.

Steve

-- 

Steve Lord                                      voice: +1-651-683-3511
Principal Engineer, Filesystem Software         email: lord@sgi.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.5 O)DIRECT problem
  2002-10-04 20:38   ` Steve Lord
@ 2002-10-04 20:51     ` Andrew Morton
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2002-10-04 20:51 UTC (permalink / raw)
  To: Steve Lord; +Cc: Linux Kernel

Steve Lord wrote:
> 
> On Fri, 2002-10-04 at 15:29, Andrew Morton wrote:
> > Steve Lord wrote:
> > > Either the flush needs to happen before the bounds checks, or the
> > > invalidate should only be done on a successful write. It looks
> > > pretty hard to detect the latter case with the current structure,
> > > we can get EINVAL from the bounds check and possibly from an
> > > aligned, but invalid memory address being passed in.
> >
> > Yes I agree; let's just do the sync before any checks.
> >
> > I think it should be moved into generic_file_direct_IO(),
> > because that's the place where the invalidation happens, yes?
> 
> OK, sounds good to me, I will let my tests churn away on that
> version and see what happens. I think something else is doing
> the same thing to me elsewhere, but it might well be an xfs
> specific case.

I queued the below patch.

BTW, I'm sitting on a patch from Badari which allows the direct-io
code to perform 512-byte-aligned and multiple-of-512-byte-sized
IO against a 4k blocksize filesystem.  We make (reasonable)
assumptions about the return value from get_block(): scale it
from softblocksize up to sectors and then add an offset. 

And we do weird things with the ZERO_PAGE to cater for the case
where the filesystem block is buffer_new() - use bits of the
ZERO_PAGE to fill out the gaps in the BIOs to zero out bits of
disk blocks.

We're also currently requiring that the filesystem pass its backing
block_device * into generic_direct_IO so we can run bdev_hardsectsize()
at the right time, which I'm not 100% happy with.

But relaxing the 4k alignment requirement has great value, so
we'll persist with that.  I'll include that in the next -mm;
you may want to take a look at it.



patch:

If the alignment checks in generic_direct_IO() fail, we end up not
forcing writeback of dirty pagecache pages, but we still run
invalidate_inode_pages2().  The net result is that dirty pagecache gets
incorrectly removed.  I guess this will expose unwritten disk blocks.

So move the sync up into generic_file_direct_IO(), where we perform the
invalidation.  So we know that pagecache and disk are in sync before we
do anything else.



 fs/direct-io.c |   21 ++++++++++-----------
 1 files changed, 10 insertions(+), 11 deletions(-)

--- 2.5.40/fs/direct-io.c~direct-io-invalidation-fix	Fri Oct  4 13:41:37 2002
+++ 2.5.40-akpm/fs/direct-io.c	Fri Oct  4 13:41:37 2002
@@ -620,13 +620,11 @@ generic_direct_IO(int rw, struct inode *
 	int seg;
 	size_t size;
 	unsigned long addr;
-	struct address_space *mapping = inode->i_mapping;
 	unsigned blocksize_mask = (1 << inode->i_blkbits) - 1;
 	ssize_t retval = -EINVAL;
 
-	if (offset & blocksize_mask) {
+	if (offset & blocksize_mask)
 		goto out;
-	}
 
 	/* Check the memory alignment.  Blocks cannot straddle pages */
 	for (seg = 0; seg < nr_segs; seg++) {
@@ -636,14 +634,6 @@ generic_direct_IO(int rw, struct inode *
 			goto out;	
 	}
 
-	if (mapping->nrpages) {
-		retval = filemap_fdatawrite(mapping);
-		if (retval == 0)
-			retval = filemap_fdatawait(mapping);
-		if (retval)
-			goto out;
-	}
-
 	retval = direct_io_worker(rw, inode, iov, offset, nr_segs, get_blocks);
 out:
 	return retval;
@@ -656,8 +646,17 @@ generic_file_direct_IO(int rw, struct in
 	struct address_space *mapping = inode->i_mapping;
 	ssize_t retval;
 
+	if (mapping->nrpages) {
+		retval = filemap_fdatawrite(mapping);
+		if (retval == 0)
+			retval = filemap_fdatawait(mapping);
+		if (retval)
+			goto out;
+	}
+
 	retval = mapping->a_ops->direct_IO(rw, inode, iov, offset, nr_segs);
 	if (inode->i_mapping->nrpages)
 		invalidate_inode_pages2(inode->i_mapping);
+out:
 	return retval;
 }

.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-10-04 20:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-04 20:17 2.5 O)DIRECT problem Steve Lord
2002-10-04 20:29 ` Andrew Morton
2002-10-04 20:38   ` Steve Lord
2002-10-04 20:51     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).