All of lore.kernel.org
 help / color / mirror / Atom feed
* Direct IO for fat
@ 2007-02-08 11:28 Jan Kara
  2007-02-08 15:44 ` OGAWA Hirofumi
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2007-02-08 11:28 UTC (permalink / raw)
  To: hirofumi; +Cc: linux-kernel

  Hello,

  I've noticed that extending a file using direct IO fails for FAT with
EINVAL. It's basically because of the following code in fat_direct_IO():

if (rw == WRITE) {
        /*
         * FIXME: blockdev_direct_IO() doesn't use
         * ->prepare_write(),
         * so we need to update the ->mmu_private to block
         * boundary.
         *
         * But we must fill the remaining area or hole by nul for
         * updating ->mmu_private.
         */
        loff_t size = offset + iov_length(iov, nr_segs);
        if (MSDOS_I(inode)->mmu_private < size)
                return -EINVAL;
}

  But isn't this check bogus? blockdev_direct_IO writes only to space that
is already allocated and stops as soon as it needs to extend the file
(further extension is then handled by buffered writes). So it should
already do what it needed for FAT. Thanks for an answer in advance.

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Direct IO for fat
  2007-02-08 11:28 Direct IO for fat Jan Kara
@ 2007-02-08 15:44 ` OGAWA Hirofumi
  2007-02-08 16:10   ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: OGAWA Hirofumi @ 2007-02-08 15:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel

Jan Kara <jack@suse.cz> writes:

>   Hello,

Hello,

>   I've noticed that extending a file using direct IO fails for FAT with
> EINVAL. It's basically because of the following code in fat_direct_IO():
>
> if (rw == WRITE) {
>         /*
>          * FIXME: blockdev_direct_IO() doesn't use
>          * ->prepare_write(),
>          * so we need to update the ->mmu_private to block
>          * boundary.
>          *
>          * But we must fill the remaining area or hole by nul for
>          * updating ->mmu_private.
>          */
>         loff_t size = offset + iov_length(iov, nr_segs);
>         if (MSDOS_I(inode)->mmu_private < size)
>                 return -EINVAL;
> }
>
>   But isn't this check bogus? blockdev_direct_IO writes only to space that
> is already allocated and stops as soon as it needs to extend the file
> (further extension is then handled by buffered writes). So it should
> already do what it needed for FAT. Thanks for an answer in advance.

FAT has to fill the hole completely, but DIO doesn't seems to do.

e.g.
        fd = open("file", O_WRONLY | O_CREAT | O_TRUNC);
        write(fd, buf, 512);
        lseek(fd, 10000, SEEK_SET);
        write(fd, buf, 512);

We need to allocate the blocks on 512 ~ 10000, and fill it with zero.
However, I think DIO doesn't fill it. If I'm missing something, please
let me know, I'll kill that check.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Direct IO for fat
  2007-02-08 15:44 ` OGAWA Hirofumi
@ 2007-02-08 16:10   ` Jan Kara
  2007-02-08 16:40     ` OGAWA Hirofumi
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2007-02-08 16:10 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: linux-kernel

On Fri 09-02-07 00:44:06, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
  Hello,

> >   I've noticed that extending a file using direct IO fails for FAT with
> > EINVAL. It's basically because of the following code in fat_direct_IO():
> >
> > if (rw == WRITE) {
> >         /*
> >          * FIXME: blockdev_direct_IO() doesn't use
> >          * ->prepare_write(),
> >          * so we need to update the ->mmu_private to block
> >          * boundary.
> >          *
> >          * But we must fill the remaining area or hole by nul for
> >          * updating ->mmu_private.
> >          */
> >         loff_t size = offset + iov_length(iov, nr_segs);
> >         if (MSDOS_I(inode)->mmu_private < size)
> >                 return -EINVAL;
> > }
> >
> >   But isn't this check bogus? blockdev_direct_IO writes only to space that
> > is already allocated and stops as soon as it needs to extend the file
> > (further extension is then handled by buffered writes). So it should
> > already do what it needed for FAT. Thanks for an answer in advance.
> 
> FAT has to fill the hole completely, but DIO doesn't seems to do.
> 
> e.g.
>         fd = open("file", O_WRONLY | O_CREAT | O_TRUNC);
>         write(fd, buf, 512);
>         lseek(fd, 10000, SEEK_SET);
>         write(fd, buf, 512);
> 
> We need to allocate the blocks on 512 ~ 10000, and fill it with zero.
> However, I think DIO doesn't fill it. If I'm missing something, please
> let me know, I'll kill that check.
  I know. DIO doesn't do it. But the point is that if blockdev_direct_IO
finds out it should allocate new blocks, it exits without allocating them.
Then in __generic_file_aio_write_nolock() if we find out that we did not
write everything in generic_file_direct_write(), we just call
generic_file_buffered_write() to write the unwritten part.
  Hence, in case you describe above, the second write() finds out that
block is not allocated and eventually everything falls back to calling
generic_file_buffered_write() which calls prepare_write() and everything is
happy.

								Honza
> 
> Thanks.
> -- 
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Direct IO for fat
  2007-02-08 16:10   ` Jan Kara
@ 2007-02-08 16:40     ` OGAWA Hirofumi
  2007-02-08 17:35       ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: OGAWA Hirofumi @ 2007-02-08 16:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel

Jan Kara <jack@suse.cz> writes:

>> FAT has to fill the hole completely, but DIO doesn't seems to do.
>> 
>> e.g.
>>         fd = open("file", O_WRONLY | O_CREAT | O_TRUNC);
>>         write(fd, buf, 512);
>>         lseek(fd, 10000, SEEK_SET);
>>         write(fd, buf, 512);
>> 
>> We need to allocate the blocks on 512 ~ 10000, and fill it with zero.
>> However, I think DIO doesn't fill it. If I'm missing something, please
>> let me know, I'll kill that check.
>   I know. DIO doesn't do it. But the point is that if blockdev_direct_IO
> finds out it should allocate new blocks, it exits without allocating them.
> Then in __generic_file_aio_write_nolock() if we find out that we did not
> write everything in generic_file_direct_write(), we just call
> generic_file_buffered_write() to write the unwritten part.
>   Hence, in case you describe above, the second write() finds out that
> block is not allocated and eventually everything falls back to calling
> generic_file_buffered_write() which calls prepare_write() and everything is
> happy.

I see. But sorry, I can't see where is preventing it... Finally, I
think we do the following on second write(2).

This is write, so create == 1, and ->lock_type == DIO_LOCKING,
and dio->block_in_file > ->i_size, so DIO callback fat_get_block() with
create == 1.

Then fat_get_block() seems to allocate block without fill hole,
because it can't know caller is prepre_write or not...
Well, anyway I'll test it on weekend. Thanks.

-> blockdev_direct_IO()
  -> direct_io_worker()
    -> do_direct_IO()
      -> get_more_blocks()

		create = dio->rw & WRITE;
		if (dio->lock_type == DIO_LOCKING) {
			if (dio->block_in_file < (i_size_read(dio->inode) >>
							dio->blkbits))
				create = 0;
		} else if (dio->lock_type == DIO_NO_LOCKING) {
			create = 0;
		}

		/*
		 * For writes inside i_size we forbid block creations: only
		 * overwrites are permitted.  We fall back to buffered writes
		 * at a higher level for inside-i_size block-instantiating
		 * writes.
		 */
		ret = (*dio->get_block)(dio->inode, fs_startblk,
						map_bh, create);
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Direct IO for fat
  2007-02-08 16:40     ` OGAWA Hirofumi
@ 2007-02-08 17:35       ` Jan Kara
  2007-02-08 18:41         ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2007-02-08 17:35 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: linux-kernel

On Fri 09-02-07 01:40:31, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> FAT has to fill the hole completely, but DIO doesn't seems to do.
> >> 
> >> e.g.
> >>         fd = open("file", O_WRONLY | O_CREAT | O_TRUNC);
> >>         write(fd, buf, 512);
> >>         lseek(fd, 10000, SEEK_SET);
> >>         write(fd, buf, 512);
> >> 
> >> We need to allocate the blocks on 512 ~ 10000, and fill it with zero.
> >> However, I think DIO doesn't fill it. If I'm missing something, please
> >> let me know, I'll kill that check.
> >   I know. DIO doesn't do it. But the point is that if blockdev_direct_IO
> > finds out it should allocate new blocks, it exits without allocating them.
> > Then in __generic_file_aio_write_nolock() if we find out that we did not
> > write everything in generic_file_direct_write(), we just call
> > generic_file_buffered_write() to write the unwritten part.
> >   Hence, in case you describe above, the second write() finds out that
> > block is not allocated and eventually everything falls back to calling
> > generic_file_buffered_write() which calls prepare_write() and everything is
> > happy.
> 
> I see. But sorry, I can't see where is preventing it... Finally, I
> think we do the following on second write(2).
> 
> This is write, so create == 1, and ->lock_type == DIO_LOCKING,
> and dio->block_in_file > ->i_size, so DIO callback fat_get_block() with
> create == 1.
  I think you misread the code - see below.

> Then fat_get_block() seems to allocate block without fill hole,
> because it can't know caller is prepre_write or not...
> Well, anyway I'll test it on weekend. Thanks.
> 
> -> blockdev_direct_IO()
>   -> direct_io_worker()
>     -> do_direct_IO()
>       -> get_more_blocks()
> 
> 		create = dio->rw & WRITE;
  Here, create == 1.

> 		if (dio->lock_type == DIO_LOCKING) {
> 			if (dio->block_in_file < (i_size_read(dio->inode) >>
> 							dio->blkbits))
> 				create = 0;
  But here create was reset back to 0 - exactly because
dio->block_in_file > i_size...

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Direct IO for fat
  2007-02-08 17:35       ` Jan Kara
@ 2007-02-08 18:41         ` Jan Kara
  2007-02-08 19:53           ` OGAWA Hirofumi
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2007-02-08 18:41 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: linux-kernel

> On Fri 09-02-07 01:40:31, OGAWA Hirofumi wrote:
> > Jan Kara <jack@suse.cz> writes:
> > 
> > >> FAT has to fill the hole completely, but DIO doesn't seems to do.
> > >> 
> > >> e.g.
> > >>         fd = open("file", O_WRONLY | O_CREAT | O_TRUNC);
> > >>         write(fd, buf, 512);
> > >>         lseek(fd, 10000, SEEK_SET);
> > >>         write(fd, buf, 512);
> > >> 
> > >> We need to allocate the blocks on 512 ~ 10000, and fill it with zero.
> > >> However, I think DIO doesn't fill it. If I'm missing something, please
> > >> let me know, I'll kill that check.
> > >   I know. DIO doesn't do it. But the point is that if blockdev_direct_IO
> > > finds out it should allocate new blocks, it exits without allocating them.
> > > Then in __generic_file_aio_write_nolock() if we find out that we did not
> > > write everything in generic_file_direct_write(), we just call
> > > generic_file_buffered_write() to write the unwritten part.
> > >   Hence, in case you describe above, the second write() finds out that
> > > block is not allocated and eventually everything falls back to calling
> > > generic_file_buffered_write() which calls prepare_write() and everything is
> > > happy.
> > 
> > I see. But sorry, I can't see where is preventing it... Finally, I
> > think we do the following on second write(2).
> > 
> > This is write, so create == 1, and ->lock_type == DIO_LOCKING,
> > and dio->block_in_file > ->i_size, so DIO callback fat_get_block() with
> > create == 1.
>   I think you misread the code - see below.
> 
> > Then fat_get_block() seems to allocate block without fill hole,
> > because it can't know caller is prepre_write or not...
> > Well, anyway I'll test it on weekend. Thanks.
> > 
> > -> blockdev_direct_IO()
> >   -> direct_io_worker()
> >     -> do_direct_IO()
> >       -> get_more_blocks()
> > 
> > 		create = dio->rw & WRITE;
>   Here, create == 1.
> 
> > 		if (dio->lock_type == DIO_LOCKING) {
> > 			if (dio->block_in_file < (i_size_read(dio->inode) >>
> > 							dio->blkbits))
> > 				create = 0;
>   But here create was reset back to 0 - exactly because
> dio->block_in_file > i_size...
  Obviously, I'm blind and you're right ;) This test is not satisfied
and so create == 1.
  But still it would seem better to me to return 0 from fat_direct_IO()
instead of EINVAL so that write falls back to a buffered one, instead
returning the error...

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Direct IO for fat
  2007-02-08 18:41         ` Jan Kara
@ 2007-02-08 19:53           ` OGAWA Hirofumi
  2007-02-09  7:19             ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: OGAWA Hirofumi @ 2007-02-08 19:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel

Jan Kara <jack@suse.cz> writes:

>> > -> blockdev_direct_IO()
>> >   -> direct_io_worker()
>> >     -> do_direct_IO()
>> >       -> get_more_blocks()
>> > 
>> > 		create = dio->rw & WRITE;
>>   Here, create == 1.
>> 
>> > 		if (dio->lock_type == DIO_LOCKING) {
>> > 			if (dio->block_in_file < (i_size_read(dio->inode) >>
>> > 							dio->blkbits))
>> > 				create = 0;
>>   But here create was reset back to 0 - exactly because
>> dio->block_in_file > i_size...
>   Obviously, I'm blind and you're right ;) This test is not satisfied
> and so create == 1.
>   But still it would seem better to me to return 0 from fat_direct_IO()
> instead of EINVAL so that write falls back to a buffered one, instead
> returning the error...

I see. When I wrote this, I thought kernel should use DIO to write if
user sets O_DIRECT. Because the wrong alignment request isn't fallback
to buffered-write, and it's also returns EINVAL.

But I don't have strong opinion here. If anyone (you) has any request
of it, I'll not have objection to it.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Direct IO for fat
  2007-02-08 19:53           ` OGAWA Hirofumi
@ 2007-02-09  7:19             ` Jan Kara
  2007-02-09 13:42               ` OGAWA Hirofumi
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2007-02-09  7:19 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: linux-kernel

On Fri 09-02-07 04:53:02, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> > -> blockdev_direct_IO()
> >> >   -> direct_io_worker()
> >> >     -> do_direct_IO()
> >> >       -> get_more_blocks()
> >> > 
> >> > 		create = dio->rw & WRITE;
> >>   Here, create == 1.
> >> 
> >> > 		if (dio->lock_type == DIO_LOCKING) {
> >> > 			if (dio->block_in_file < (i_size_read(dio->inode) >>
> >> > 							dio->blkbits))
> >> > 				create = 0;
> >>   But here create was reset back to 0 - exactly because
> >> dio->block_in_file > i_size...
> >   Obviously, I'm blind and you're right ;) This test is not satisfied
> > and so create == 1.
> >   But still it would seem better to me to return 0 from fat_direct_IO()
> > instead of EINVAL so that write falls back to a buffered one, instead
> > returning the error...
> 
> I see. When I wrote this, I thought kernel should use DIO to write if
> user sets O_DIRECT. Because the wrong alignment request isn't fallback
> to buffered-write, and it's also returns EINVAL.
  I understand. It's just that I've got some surprised users who could not
track why the hell does write() return EINVAL to them when they have
everything alligned and the same code works for EXT3 :). Of course, nothing
guarantees that FAT should behave the same way as EXT3 but I can understand
they were surprised (I had to look in the code too).
  I also don't have a strong opinion whether we should fallback to buffered
write automagically or whether we should return EINVAL and let the user fall
back to the buffered write himself. But I'd slightly prefer the first
option.

									Honza

-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Direct IO for fat
  2007-02-09  7:19             ` Jan Kara
@ 2007-02-09 13:42               ` OGAWA Hirofumi
  0 siblings, 0 replies; 9+ messages in thread
From: OGAWA Hirofumi @ 2007-02-09 13:42 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel

Jan Kara <jack@suse.cz> writes:

>> I see. When I wrote this, I thought kernel should use DIO to write if
>> user sets O_DIRECT. Because the wrong alignment request isn't fallback
>> to buffered-write, and it's also returns EINVAL.
>   I understand. It's just that I've got some surprised users who could not
> track why the hell does write() return EINVAL to them when they have
> everything alligned and the same code works for EXT3 :). Of course, nothing
> guarantees that FAT should behave the same way as EXT3 but I can understand
> they were surprised (I had to look in the code too).
>   I also don't have a strong opinion whether we should fallback to buffered
> write automagically or whether we should return EINVAL and let the user fall
> back to the buffered write himself. But I'd slightly prefer the first
> option.

Hm, ok. I'll change EINVAL to zero as soon as possible after test.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-02-09 13:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-08 11:28 Direct IO for fat Jan Kara
2007-02-08 15:44 ` OGAWA Hirofumi
2007-02-08 16:10   ` Jan Kara
2007-02-08 16:40     ` OGAWA Hirofumi
2007-02-08 17:35       ` Jan Kara
2007-02-08 18:41         ` Jan Kara
2007-02-08 19:53           ` OGAWA Hirofumi
2007-02-09  7:19             ` Jan Kara
2007-02-09 13:42               ` OGAWA Hirofumi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.