linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] RFC directio: partial writes support
@ 2010-02-25 12:45 Dmitry Monakhov
  2010-02-27 11:10 ` Dmitry Monakhov
  2010-03-01 23:21 ` Andrew Morton
  0 siblings, 2 replies; 6+ messages in thread
From: Dmitry Monakhov @ 2010-02-25 12:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 663 bytes --]

Can someone please describe me why directio deny partial writes.
For example if someone try to write 100Mb but file system has less
data it return ENOSPC in the middle of block allocation.
All allocated blocks will be truncated (it may be 100Mb -4k) end
ENOSPC will be returned. As far as i remember direct_io always act
like this, but i never asked why?
Why do we have to give up all the progress we made?
In fact partial writes are possible in case of holes, when we 
fall back to buffered write. XFS implemented partial writes.

I've done trivial changes and it works like charm. 
Let's enable partial writes support and allow caller to define
this behavior.


[-- Attachment #2: 0001-direct_io-Allow-partial-writes.patch --]
[-- Type: text/plain, Size: 2189 bytes --]

>From 4a72c4a61e133140750d05853b8dafecd8ef5d87 Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Thu, 25 Feb 2010 15:14:48 +0300
Subject: [PATCH] direct_io: Allow partial writes

Current direct io allocation behavior is inconvenient. Partial writes
are not supported. If we try to issue 10Mb chunk, but only 5Mb is
available then we will allocate thees 5Mb until ENOSPC, and then
drop such space and return ENOSPC.
But in fact partial writes are possible in case of holes.
Seems that there is no enough reason to deny partial writes.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/direct-io.c     |    4 ++++
 fs/ext4/inode.c    |   11 +++++++----
 include/linux/fs.h |    2 ++
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index e82adc2..250a041 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -229,6 +229,10 @@ static int dio_complete(struct dio *dio, loff_t offset, int ret)
 		ret = 0;
 
 	if (dio->result) {
+		/* Ignore error if we have written some data */
+		if (dio->flags & DIO_PARTIAL_WRITE)
+			ret = 0;
+
 		transferred = dio->result;
 
 		/* Check for short read case */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 218ea0b..8c00127 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3447,10 +3447,13 @@ retry:
 				 offset, nr_segs,
 				 ext4_get_block, NULL);
 	else
-		ret = blockdev_direct_IO(rw, iocb, inode,
-				 inode->i_sb->s_bdev, iov,
-				 offset, nr_segs,
-				 ext4_get_block, NULL);
+		ret = __blockdev_direct_IO(rw, iocb, inode,
+					inode->i_sb->s_bdev, iov,
+					offset, nr_segs,
+					ext4_get_block, NULL,
+					DIO_LOCKING | DIO_SKIP_HOLES |
+					DIO_PARTIAL_WRITE);
+
 	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
 		goto retry;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9147ca8..d887685 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2259,6 +2259,8 @@ enum {
 
 	/* filesystem does not support filling holes */
 	DIO_SKIP_HOLES	= 0x02,
+	/* allow partial writes */
+	DIO_PARTIAL_WRITE = 0x04,
 };
 
 static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [patch] RFC directio: partial writes support
  2010-02-25 12:45 [patch] RFC directio: partial writes support Dmitry Monakhov
@ 2010-02-27 11:10 ` Dmitry Monakhov
  2010-03-01 23:21 ` Andrew Morton
  1 sibling, 0 replies; 6+ messages in thread
From: Dmitry Monakhov @ 2010-02-27 11:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, Andrew Morton

Dmitry Monakhov <dmonakhov@openvz.org> writes:

> Can someone please describe me why directio deny partial writes.
> For example if someone try to write 100Mb but file system has less
> data it return ENOSPC in the middle of block allocation.
> All allocated blocks will be truncated (it may be 100Mb -4k) end
> ENOSPC will be returned. As far as i remember direct_io always act
> like this, but i never asked why?
> Why do we have to give up all the progress we made?
> In fact partial writes are possible in case of holes, when we 
> fall back to buffered write. XFS implemented partial writes.
>
> I've done trivial changes and it works like charm. 
> Let's enable partial writes support and allow caller to define
> this behavior.
add Andrew to cc:

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] RFC directio: partial writes support
  2010-02-25 12:45 [patch] RFC directio: partial writes support Dmitry Monakhov
  2010-02-27 11:10 ` Dmitry Monakhov
@ 2010-03-01 23:21 ` Andrew Morton
  2010-03-02  9:25   ` Nick Piggin
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2010-03-01 23:21 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-kernel, linux-fsdevel

On Thu, 25 Feb 2010 15:45:58 +0300
Dmitry Monakhov <dmonakhov@openvz.org> wrote:

> Can someone please describe me why directio deny partial writes.
> For example if someone try to write 100Mb but file system has less
> data it return ENOSPC in the middle of block allocation.
> All allocated blocks will be truncated (it may be 100Mb -4k) end
> ENOSPC will be returned. As far as i remember direct_io always act
> like this, but i never asked why?
> Why do we have to give up all the progress we made?
> In fact partial writes are possible in case of holes, when we 
> fall back to buffered write. XFS implemented partial writes.

The problem with direct-io writes is that the writes don't necessarily
complete in file-offset-ascending order.  So if we've issued 50 write
BIOs and then hit an EIO on a BIO then we could have a hunk of
unwritten data with newly-writted data either side of it.  If we get a
bunch of discontiguous EIO BIOs coming in then the problem gets even
messier - we have a span of disk which has a random mix of
correctly-written and not-correctly-written runs of sectors.  What do
we do with that?

The code _could_ perhaps go back and crawl through the request and
identify the number of successfully-written bytes between
start-of-request and first-EIO and then return that.  But we didn't
bother.


ENOSPC errors are handled via the same code path and hence got
deoptimised due to this EIO handling.  We could perhaps improve the
ENOSPC handling along the lines you propose, as long as we
appropriately take care of EIO considerations.  Which, afacit, your
patch didn't do.

The presence of opt-in DIO_PARTIAL_WRITE thing is rather unfortunate -
it would be better to make this change for all filesystems in one hit. 
But I guess DIO_PARTIAL_WRITE permits us to migrate filesystems
one-at-a-time as testing permits.  But the aim should be to remove
DIO_PARTIAL_WRITE altogether once all the conversion and testing is
completed.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] RFC directio: partial writes support
  2010-03-01 23:21 ` Andrew Morton
@ 2010-03-02  9:25   ` Nick Piggin
  2010-03-02 11:34     ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2010-03-02  9:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dmitry Monakhov, linux-kernel, linux-fsdevel

On Mon, Mar 01, 2010 at 03:21:49PM -0800, Andrew Morton wrote:
> On Thu, 25 Feb 2010 15:45:58 +0300
> Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> 
> > Can someone please describe me why directio deny partial writes.
> > For example if someone try to write 100Mb but file system has less
> > data it return ENOSPC in the middle of block allocation.
> > All allocated blocks will be truncated (it may be 100Mb -4k) end
> > ENOSPC will be returned. As far as i remember direct_io always act
> > like this, but i never asked why?
> > Why do we have to give up all the progress we made?
> > In fact partial writes are possible in case of holes, when we 
> > fall back to buffered write. XFS implemented partial writes.
> 
> The problem with direct-io writes is that the writes don't necessarily
> complete in file-offset-ascending order.  So if we've issued 50 write
> BIOs and then hit an EIO on a BIO then we could have a hunk of
> unwritten data with newly-writted data either side of it.  If we get a
> bunch of discontiguous EIO BIOs coming in then the problem gets even
> messier - we have a span of disk which has a random mix of
> correctly-written and not-correctly-written runs of sectors.  What do
> we do with that?

Hmm, what if we're filling in a hole with direct IO? I don't see where
blocks allocated in DIO code will be trimmed on a failed write (because
it's within isize). This could cause uninitalized data of the block to
leak couldn't it?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] RFC directio: partial writes support
  2010-03-02  9:25   ` Nick Piggin
@ 2010-03-02 11:34     ` Jan Kara
  2010-03-02 12:37       ` Nick Piggin
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2010-03-02 11:34 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, Dmitry Monakhov, linux-kernel, linux-fsdevel

On Tue 02-03-10 20:25:02, Nick Piggin wrote:
> On Mon, Mar 01, 2010 at 03:21:49PM -0800, Andrew Morton wrote:
> > On Thu, 25 Feb 2010 15:45:58 +0300
> > Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> > 
> > > Can someone please describe me why directio deny partial writes.
> > > For example if someone try to write 100Mb but file system has less
> > > data it return ENOSPC in the middle of block allocation.
> > > All allocated blocks will be truncated (it may be 100Mb -4k) end
> > > ENOSPC will be returned. As far as i remember direct_io always act
> > > like this, but i never asked why?
> > > Why do we have to give up all the progress we made?
> > > In fact partial writes are possible in case of holes, when we 
> > > fall back to buffered write. XFS implemented partial writes.
> > 
> > The problem with direct-io writes is that the writes don't necessarily
> > complete in file-offset-ascending order.  So if we've issued 50 write
> > BIOs and then hit an EIO on a BIO then we could have a hunk of
> > unwritten data with newly-writted data either side of it.  If we get a
> > bunch of discontiguous EIO BIOs coming in then the problem gets even
> > messier - we have a span of disk which has a random mix of
> > correctly-written and not-correctly-written runs of sectors.  What do
> > we do with that?
> 
> Hmm, what if we're filling in a hole with direct IO? I don't see where
> blocks allocated in DIO code will be trimmed on a failed write (because
> it's within isize). This could cause uninitalized data of the block to
> leak couldn't it?
  The trick is that blockdev_direct_IO is defined to pass
DIO_SKIP_HOLES to __blockdev_direct_IO. Thus e.g. ext2 or ext3 will just
fail the direct IO if there is a hole and we fall back to buffered IO
which should handle that just fine.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] RFC directio: partial writes support
  2010-03-02 11:34     ` Jan Kara
@ 2010-03-02 12:37       ` Nick Piggin
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2010-03-02 12:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, Dmitry Monakhov, linux-kernel, linux-fsdevel

On Tue, Mar 02, 2010 at 12:34:06PM +0100, Jan Kara wrote:
> On Tue 02-03-10 20:25:02, Nick Piggin wrote:
> > On Mon, Mar 01, 2010 at 03:21:49PM -0800, Andrew Morton wrote:
> > > On Thu, 25 Feb 2010 15:45:58 +0300
> > > Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> > > 
> > > > Can someone please describe me why directio deny partial writes.
> > > > For example if someone try to write 100Mb but file system has less
> > > > data it return ENOSPC in the middle of block allocation.
> > > > All allocated blocks will be truncated (it may be 100Mb -4k) end
> > > > ENOSPC will be returned. As far as i remember direct_io always act
> > > > like this, but i never asked why?
> > > > Why do we have to give up all the progress we made?
> > > > In fact partial writes are possible in case of holes, when we 
> > > > fall back to buffered write. XFS implemented partial writes.
> > > 
> > > The problem with direct-io writes is that the writes don't necessarily
> > > complete in file-offset-ascending order.  So if we've issued 50 write
> > > BIOs and then hit an EIO on a BIO then we could have a hunk of
> > > unwritten data with newly-writted data either side of it.  If we get a
> > > bunch of discontiguous EIO BIOs coming in then the problem gets even
> > > messier - we have a span of disk which has a random mix of
> > > correctly-written and not-correctly-written runs of sectors.  What do
> > > we do with that?
> > 
> > Hmm, what if we're filling in a hole with direct IO? I don't see where
> > blocks allocated in DIO code will be trimmed on a failed write (because
> > it's within isize). This could cause uninitalized data of the block to
> > leak couldn't it?
>   The trick is that blockdev_direct_IO is defined to pass
> DIO_SKIP_HOLES to __blockdev_direct_IO. Thus e.g. ext2 or ext3 will just
> fail the direct IO if there is a hole and we fall back to buffered IO
> which should handle that just fine.

OK yes I see, I missed that.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-03-02 12:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-25 12:45 [patch] RFC directio: partial writes support Dmitry Monakhov
2010-02-27 11:10 ` Dmitry Monakhov
2010-03-01 23:21 ` Andrew Morton
2010-03-02  9:25   ` Nick Piggin
2010-03-02 11:34     ` Jan Kara
2010-03-02 12:37       ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).