linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH] problem of cont_prepare_write()
@ 2004-11-22 10:30 OGAWA Hirofumi
  2004-11-22 10:41 ` Andrew Morton
  2004-11-22 10:46 ` Andrew Morton
  0 siblings, 2 replies; 15+ messages in thread
From: OGAWA Hirofumi @ 2004-11-22 10:30 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton; +Cc: linux-kernel

Hi,

If I do the following operation on fatfs, and my box under heavy load,

	open("testfile", O_CREAT | O_TRUNC | O_RDWR, 0664);
	lseek(fd, 500*1024*1024 - 1, SEEK_SET);
	write(fd, "\0", 1);

In cont_prepare_write(), kernel fills the hole by zero cleared page.

fs/buffer.c:cont_prepare_write:2210,
	while(page->index > (pgpos = *bytes>>PAGE_CACHE_SHIFT)) {

		[...]	

		status = __block_prepare_write(inode, new_page, zerofrom,
						PAGE_CACHE_SIZE, get_block);
		if (status)
			goto out_unmap;
		kaddr = kmap_atomic(new_page, KM_USER0);
		memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
		flush_dcache_page(new_page);
		kunmap_atomic(kaddr, KM_USER0);
		__block_commit_write(inode, new_page,
				zerofrom, PAGE_CACHE_SIZE);
		unlock_page(new_page);
		page_cache_release(new_page);
	}

But until ->commit_write(), kernel doesn't update the ->i_size. Then,
if kernel writes out that hole page before updates of ->i_size, dirty
flag of buffer_head is cleared in __block_write_full_page(). So hole
page was not writed to disk.

fs/buffer.c:__block_write_full_page:1773,
		if (block > last_block) {
			/*
			 * mapped buffers outside i_size will occur, because
			 * this page can be outside i_size when there is a
			 * truncate in progress.
			 */
			/*
			 * The buffer was zeroed by block_write_full_page()
			 */
			clear_buffer_dirty(bh);
			set_buffer_uptodate(bh);

This became cause of data corruption.

I thought that it is not good to update ->i_size before ->commit_write().
So, I wrote the following patch.  Anyone has good idea for solving this
problem?



Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---

 fs/buffer.c        |    4 ++--
 include/linux/fs.h |    4 ++++
 mm/memory.c        |    2 ++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff -puN mm/memory.c~cont_prepare_write-fix mm/memory.c
--- linux-2.6.10-rc2/mm/memory.c~cont_prepare_write-fix	2004-11-22 18:59:19.000000000 +0900
+++ linux-2.6.10-rc2-hirofumi/mm/memory.c	2004-11-22 18:59:19.000000000 +0900
@@ -1246,9 +1246,11 @@ int vmtruncate(struct inode * inode, lof
 	 */
 	if (IS_SWAPFILE(inode))
 		goto out_busy;
+	inode->i_flags |= S_TRUNCATING;
 	i_size_write(inode, offset);
 	unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
 	truncate_inode_pages(mapping, offset);
+	inode->i_flags &= ~S_TRUNCATING;
 	goto out_truncate;
 
 do_expand:
diff -puN fs/buffer.c~cont_prepare_write-fix fs/buffer.c
--- linux-2.6.10-rc2/fs/buffer.c~cont_prepare_write-fix	2004-11-22 18:59:19.000000000 +0900
+++ linux-2.6.10-rc2-hirofumi/fs/buffer.c	2004-11-22 18:59:19.000000000 +0900
@@ -1763,7 +1763,7 @@ static int __block_write_full_page(struc
 	 * handle any aliases from the underlying blockdev's mapping.
 	 */
 	do {
-		if (block > last_block) {
+		if (IS_TRUNCATING(inode) && block > last_block) {
 			/*
 			 * mapped buffers outside i_size will occur, because
 			 * this page can be outside i_size when there is a
@@ -2628,7 +2628,7 @@ int block_write_full_page(struct page *p
 
 	/* Is the page fully outside i_size? (truncate in progress) */
 	offset = i_size & (PAGE_CACHE_SIZE-1);
-	if (page->index >= end_index+1 || !offset) {
+	if (IS_TRUNCATING(inode) && (page->index >= end_index+1 || !offset)) {
 		/*
 		 * The page may have dirty, unmapped buffers.  For example,
 		 * they may have been added in ext3_writepage().  Make them
diff -puN include/linux/fs.h~cont_prepare_write-fix include/linux/fs.h
--- linux-2.6.10-rc2/include/linux/fs.h~cont_prepare_write-fix	2004-11-22 18:59:19.000000000 +0900
+++ linux-2.6.10-rc2-hirofumi/include/linux/fs.h	2004-11-22 18:59:19.000000000 +0900
@@ -151,6 +151,8 @@ extern int dir_notify_enable;
 #define S_NOCMTIME	128	/* Do not update file c/mtime */
 #define S_SWAPFILE	256	/* Do not truncate: swapon got its bmaps */
 
+#define S_TRUNCATING	(1<<31)	/* Hack: inode is truncating the data now */
+
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
  * flags just means all the inodes inherit those flags by default. It might be
@@ -185,6 +187,8 @@ extern int dir_notify_enable;
 #define IS_NOCMTIME(inode)	((inode)->i_flags & S_NOCMTIME)
 #define IS_SWAPFILE(inode)	((inode)->i_flags & S_SWAPFILE)
 
+#define IS_TRUNCATING(inode)	((inode)->i_flags & S_TRUNCATING)
+
 /* the read-only stuff doesn't really belong here, but any other place is
    probably as bad and I don't want to create yet another include file. */
 
_

-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 10:30 [RFC][PATCH] problem of cont_prepare_write() OGAWA Hirofumi
@ 2004-11-22 10:41 ` Andrew Morton
  2004-11-22 10:46 ` Andrew Morton
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2004-11-22 10:41 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: torvalds, linux-kernel

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
>  If I do the following operation on fatfs, and my box under heavy load,
> 
>  	open("testfile", O_CREAT | O_TRUNC | O_RDWR, 0664);
>  	lseek(fd, 500*1024*1024 - 1, SEEK_SET);
>  	write(fd, "\0", 1);
> 
>  In cont_prepare_write(), kernel fills the hole by zero cleared page.
> 
>  fs/buffer.c:cont_prepare_write:2210,
>  	while(page->index > (pgpos = *bytes>>PAGE_CACHE_SHIFT)) {
> 
>  		[...]	
> 
>  		status = __block_prepare_write(inode, new_page, zerofrom,
>  						PAGE_CACHE_SIZE, get_block);
>  		if (status)
>  			goto out_unmap;
>  		kaddr = kmap_atomic(new_page, KM_USER0);
>  		memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
>  		flush_dcache_page(new_page);
>  		kunmap_atomic(kaddr, KM_USER0);
>  		__block_commit_write(inode, new_page,
>  				zerofrom, PAGE_CACHE_SIZE);
>  		unlock_page(new_page);
>  		page_cache_release(new_page);
>  	}
> 
>  But until ->commit_write(), kernel doesn't update the ->i_size. Then,
>  if kernel writes out that hole page before updates of ->i_size, dirty
>  flag of buffer_head is cleared in __block_write_full_page(). So hole
>  page was not writed to disk.

But the page remains locked across both the ->prepare_write() and
->commit_write() operations.  So writeback cannot get in there to call
->writepage().

The page lock should correctly synchronise the prepare_write/commit_write
and writeback functions.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 10:30 [RFC][PATCH] problem of cont_prepare_write() OGAWA Hirofumi
  2004-11-22 10:41 ` Andrew Morton
@ 2004-11-22 10:46 ` Andrew Morton
  2004-11-22 11:03   ` Anton Altaparmakov
  2004-11-22 11:38   ` OGAWA Hirofumi
  1 sibling, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2004-11-22 10:46 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: torvalds, linux-kernel

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
> 
>  		status = __block_prepare_write(inode, new_page, zerofrom,
>  						PAGE_CACHE_SIZE, get_block);
>  		if (status)
>  			goto out_unmap;
>  		kaddr = kmap_atomic(new_page, KM_USER0);
>  		memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
>  		flush_dcache_page(new_page);
>  		kunmap_atomic(kaddr, KM_USER0);
>  		__block_commit_write(inode, new_page,
>  				zerofrom, PAGE_CACHE_SIZE);
>  		unlock_page(new_page);
>  		page_cache_release(new_page);
>  	}
> 
>  But until ->commit_write(), kernel doesn't update the ->i_size. Then,
>  if kernel writes out that hole page before updates of ->i_size, dirty
>  flag of buffer_head is cleared in __block_write_full_page(). So hole
>  page was not writed to disk.

Oh I see.  After the above page is unlocked, it's temporarily outside
i_size.

Perhaps cont_prepare_write() should look to see if the zerofilled page is
outside the current i_size and if so, advance i_size to the end of the
zerofilled page prior to releasing the page lock.

We might need to run mark_inode_dirty() at some stage, or perhaps just rely
on the caller doing that in ->commit_write().

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 10:46 ` Andrew Morton
@ 2004-11-22 11:03   ` Anton Altaparmakov
  2004-11-22 21:53     ` Andrew Morton
  2004-11-22 11:38   ` OGAWA Hirofumi
  1 sibling, 1 reply; 15+ messages in thread
From: Anton Altaparmakov @ 2004-11-22 11:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: OGAWA Hirofumi, Linus Torvalds, lkml

On Mon, 2004-11-22 at 02:46 -0800, Andrew Morton wrote:
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
> >
> > 
> >  		status = __block_prepare_write(inode, new_page, zerofrom,
> >  						PAGE_CACHE_SIZE, get_block);
> >  		if (status)
> >  			goto out_unmap;
> >  		kaddr = kmap_atomic(new_page, KM_USER0);
> >  		memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
> >  		flush_dcache_page(new_page);
> >  		kunmap_atomic(kaddr, KM_USER0);
> >  		__block_commit_write(inode, new_page,
> >  				zerofrom, PAGE_CACHE_SIZE);
> >  		unlock_page(new_page);
> >  		page_cache_release(new_page);
> >  	}
> > 
> >  But until ->commit_write(), kernel doesn't update the ->i_size. Then,
> >  if kernel writes out that hole page before updates of ->i_size, dirty
> >  flag of buffer_head is cleared in __block_write_full_page(). So hole
> >  page was not writed to disk.
> 
> Oh I see.  After the above page is unlocked, it's temporarily outside
> i_size.
> 
> Perhaps cont_prepare_write() should look to see if the zerofilled page is
> outside the current i_size and if so, advance i_size to the end of the
> zerofilled page prior to releasing the page lock.

Would it be ok to modify i_size from prepare_write?  That would make my
life in NTFS a lot easier...  There are cases in NTFS where I need to do
page updates in prepare write that would benefit from an i_size update
as well rather than having to postpone the i_size update to
commit_write.  (Note commit_write would still update i_size, too, its
just that prepare write would set i_size to be up to the start of the
write because otherwise you have a potential hole between i_size and the
start of the write and at least on NTFS that causes me a lot of
headaches with resident files and non-resident files with
initialized_size != i_size that I could make a lot easier to deal with
by updating i_size in prepare_write to point to the start of the write.)

> We might need to run mark_inode_dirty() at some stage, or perhaps just rely
> on the caller doing that in ->commit_write().

Slight problem with not running mark_inode_dirty() at this point is that
if commit_write() fails for some reason (-ENOMEM springs to mind)
mark_inode_dirty() may never get run which may cause a problem,
depending on what exactly was done in prepare_write...

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 10:46 ` Andrew Morton
  2004-11-22 11:03   ` Anton Altaparmakov
@ 2004-11-22 11:38   ` OGAWA Hirofumi
  2004-11-22 21:43     ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: OGAWA Hirofumi @ 2004-11-22 11:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, linux-kernel

Andrew Morton <akpm@osdl.org> writes:

> Perhaps cont_prepare_write() should look to see if the zerofilled page is
> outside the current i_size and if so, advance i_size to the end of the
> zerofilled page prior to releasing the page lock.

Yes, my first patch was it.

Umm... however, if ->i_size is updated before ->commit_write(),
doesn't it allow access to those pages, before all write() work is
successful?

Doesn't this cause strange behavior?
(especially a failure case of prepare_write())
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 11:38   ` OGAWA Hirofumi
@ 2004-11-22 21:43     ` Andrew Morton
  2004-11-23  2:22       ` OGAWA Hirofumi
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2004-11-22 21:43 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: torvalds, linux-kernel

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
> Andrew Morton <akpm@osdl.org> writes:
> 
> > Perhaps cont_prepare_write() should look to see if the zerofilled page is
> > outside the current i_size and if so, advance i_size to the end of the
> > zerofilled page prior to releasing the page lock.
> 
> Yes, my first patch was it.

I don't understand what you mean.  Do you mean you tried that approach and
rejected it for some reason?

> Umm... however, if ->i_size is updated before ->commit_write(),
> doesn't it allow access to those pages, before all write() work is
> successful?

That's OK.  A thread which is read()ing that page will either

a) decide that the page is outside i_size, and won't read it anyway or

b) decide that the page is inside i_size and will read the page's contents.

Still, I'd be inclined to update i_size after running ->commit_write.  It
looks like we can simply replace the call to __block_commit_write() with a
call to generic_commit_write().


But none of this has anything to do with truncate, and the patch which you
sent is playing around with the truncate code and appears to be quite
unrelated.  Did you send the correct patch?  If so, what is it designed to
do?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 11:03   ` Anton Altaparmakov
@ 2004-11-22 21:53     ` Andrew Morton
  2004-11-22 23:30       ` Anton Altaparmakov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2004-11-22 21:53 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: hirofumi, torvalds, linux-kernel

Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>
> Would it be ok to modify i_size from prepare_write?

I think so - I seem to recall seeing that done somewhere else...

One thing to watch out for is when to bring the page uptodate.  If the page
is uptodate then the read() code just won't try to lock it at all.  If you
increase i_size AND set PG_uptodate too early you could open a window which
allows read() to peek at uninitialised data.

But if you set the page uptodate only when its contents really are sane
then things should work OK.

If you end up doing

	memset(page, 0, N);
	SetPageUptodate(page);

then I think you'll need an smb_wmb() in between so the read() code sees the
above two writes in the correct order.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 21:53     ` Andrew Morton
@ 2004-11-22 23:30       ` Anton Altaparmakov
  2004-11-22 23:43         ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Anton Altaparmakov @ 2004-11-22 23:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hirofumi, torvalds, linux-kernel

On Mon, 22 Nov 2004, Andrew Morton wrote:
> Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> >
> > Would it be ok to modify i_size from prepare_write?
> 
> I think so - I seem to recall seeing that done somewhere else...

Great.

> One thing to watch out for is when to bring the page uptodate.  If the 
> page is uptodate then the read() code just won't try to lock it at all.  
> If you increase i_size AND set PG_uptodate too early you could open a 
> window which allows read() to peek at uninitialised data.
> 
> But if you set the page uptodate only when its contents really are sane
> then things should work OK.

Yes, I would never set PG_Uptodate without having sane page contents 
first.

In fact all the metadata writing code in NTFS actually clears PG_Uptodate 
on a locked page because it mangles the page contents, writes them out, 
and then demangles them again, before finally setting PG_Uptodate again 
to protect against read_cache_page() getting hold of pages with i/o in 
flight which for NTFS metadata would cause corruption.

> If you end up doing
> 
> 	memset(page, 0, N);
> 	SetPageUptodate(page);
> 
> then I think you'll need an smb_wmb() in between so the read() code sees the
> above two writes in the correct order.

Thanks.  I always have a flush_dcache_page(page) between the memset() and 
the SetPageUptodate() so I don't need the barrier, right?  Or does the 
flush_dcache_page() not imply ordering?  (I naively thought it did...)

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 23:30       ` Anton Altaparmakov
@ 2004-11-22 23:43         ` Andrew Morton
  2005-01-13 14:47           ` write barriers - Was: " Anton Altaparmakov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2004-11-22 23:43 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: hirofumi, torvalds, linux-kernel

Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>
> I always have a flush_dcache_page(page) between the memset() and 
> the SetPageUptodate() so I don't need the barrier, right?  Or does the 
> flush_dcache_page() not imply ordering?

err, flush_dcache_page() might indeed provide a write barrier on all
architectures which need write barriers.  Then again it might not ;) It's
not intended that this be the case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 21:43     ` Andrew Morton
@ 2004-11-23  2:22       ` OGAWA Hirofumi
  2004-11-23  2:30         ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: OGAWA Hirofumi @ 2004-11-23  2:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, linux-kernel

Andrew Morton <akpm@osdl.org> writes:

>> Umm... however, if ->i_size is updated before ->commit_write(),
>> doesn't it allow access to those pages, before all write() work is
>> successful?
>
> That's OK.  A thread which is read()ing that page will either
>
> a) decide that the page is outside i_size, and won't read it anyway or
>
> b) decide that the page is inside i_size and will read the page's contents.
>
> Still, I'd be inclined to update i_size after running ->commit_write.  It
> looks like we can simply replace the call to __block_commit_write() with a
> call to generic_commit_write().

If ->prepare_write() failed, I thought we should restore the ->i_size
by vmtruncate() before running ->prepare_write().

But, it's not required... yes?

Anyway, fixed patch is the following.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>



 fs/buffer.c |    3 +--
 1 files changed, 1 insertion(+), 2 deletions(-)

diff -puN fs/buffer.c~cont_prepare_write-fix fs/buffer.c
--- linux-2.6.10-rc2/fs/buffer.c~cont_prepare_write-fix	2004-11-23 11:10:10.000000000 +0900
+++ linux-2.6.10-rc2-hirofumi/fs/buffer.c	2004-11-23 11:10:10.000000000 +0900
@@ -2224,8 +2224,7 @@ int cont_prepare_write(struct page *page
 		memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
 		flush_dcache_page(new_page);
 		kunmap_atomic(kaddr, KM_USER0);
-		__block_commit_write(inode, new_page,
-				zerofrom, PAGE_CACHE_SIZE);
+		generic_commit_write(NULL, new_page, zerofrom, PAGE_CACHE_SIZE);
 		unlock_page(new_page);
 		page_cache_release(new_page);
 	}
_

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-23  2:22       ` OGAWA Hirofumi
@ 2004-11-23  2:30         ` Andrew Morton
  2004-11-23  2:55           ` OGAWA Hirofumi
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2004-11-23  2:30 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: torvalds, linux-kernel

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
> Andrew Morton <akpm@osdl.org> writes:
> 
> >> Umm... however, if ->i_size is updated before ->commit_write(),
> >> doesn't it allow access to those pages, before all write() work is
> >> successful?
> >
> > That's OK.  A thread which is read()ing that page will either
> >
> > a) decide that the page is outside i_size, and won't read it anyway or
> >
> > b) decide that the page is inside i_size and will read the page's contents.
> >
> > Still, I'd be inclined to update i_size after running ->commit_write.  It
> > looks like we can simply replace the call to __block_commit_write() with a
> > call to generic_commit_write().
> 
> If ->prepare_write() failed, I thought we should restore the ->i_size
> by vmtruncate() before running ->prepare_write().

                  ^^^^^^ I assume you meant "after"

> 
> But, it's not required... yes?

yes, it's needed in theory - see generic_file_buffered_write().  I'm trying
to remember why...

I think the only problem which that is solving is that the filesystem may
have left some blocks in the file outside i_size.  That's a minor
consistency issue which a fsck will fix up.  But I guess a subsequent lseek
may permit unwritten disk blocks to be read.

This problem is present whenever ->prepare_write() is called and we really
shouldn't be open-coding it everywhere.

> Anyway, fixed patch is the following.

Thanks. Does it pass all your testing?

> -- 
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> 
> 
> 
>  fs/buffer.c |    3 +--
>  1 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff -puN fs/buffer.c~cont_prepare_write-fix fs/buffer.c
> --- linux-2.6.10-rc2/fs/buffer.c~cont_prepare_write-fix	2004-11-23 11:10:10.000000000 +0900
> +++ linux-2.6.10-rc2-hirofumi/fs/buffer.c	2004-11-23 11:10:10.000000000 +0900
> @@ -2224,8 +2224,7 @@ int cont_prepare_write(struct page *page
>  		memset(kaddr+zerofrom, 0, PAGE_CACHE_SIZE-zerofrom);
>  		flush_dcache_page(new_page);
>  		kunmap_atomic(kaddr, KM_USER0);
> -		__block_commit_write(inode, new_page,
> -				zerofrom, PAGE_CACHE_SIZE);
> +		generic_commit_write(NULL, new_page, zerofrom, PAGE_CACHE_SIZE);
>  		unlock_page(new_page);
>  		page_cache_release(new_page);
>  	}
> _

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-23  2:30         ` Andrew Morton
@ 2004-11-23  2:55           ` OGAWA Hirofumi
  0 siblings, 0 replies; 15+ messages in thread
From: OGAWA Hirofumi @ 2004-11-23  2:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, linux-kernel

Andrew Morton <akpm@osdl.org> writes:

>> But, it's not required... yes?
>
> yes, it's needed in theory - see generic_file_buffered_write().  I'm trying
> to remember why...
>
> I think the only problem which that is solving is that the filesystem may
> have left some blocks in the file outside i_size.  That's a minor
> consistency issue which a fsck will fix up.  But I guess a subsequent lseek
> may permit unwritten disk blocks to be read.
>
> This problem is present whenever ->prepare_write() is called and we really
> shouldn't be open-coding it everywhere.

I see. Thanks.

>> Anyway, fixed patch is the following.
>
> Thanks. Does it pass all your testing?

Sorry, no. I'm still compiling kernel. I'll report the result of test
to you (probably few hours).
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* write barriers - Was: Re: [RFC][PATCH] problem of cont_prepare_write()
  2004-11-22 23:43         ` Andrew Morton
@ 2005-01-13 14:47           ` Anton Altaparmakov
  2005-01-13 15:56             ` Chris Friesen
  0 siblings, 1 reply; 15+ messages in thread
From: Anton Altaparmakov @ 2005-01-13 14:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hirofumi, Linus Torvalds, lkml

Hi,

Sorry to pickup such a long thread but here goes anyway...

On Mon, 2004-11-22 at 15:43 -0800, Andrew Morton wrote:
> Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> >
> > I always have a flush_dcache_page(page) between the memset() and 
> > the SetPageUptodate() so I don't need the barrier, right?  Or does the 
> > flush_dcache_page() not imply ordering?
> 
> err, flush_dcache_page() might indeed provide a write barrier on all
> architectures which need write barriers.  Then again it might not ;) It's
> not intended that this be the case.

What about if the page is unmapped between the memset() and the
SetPageUptodate()?  Does that imply ordering?  I.e. do I need a write
barrier in code like this:

memset(page_address(page), 0, blah);
flush_dcache_page(page);
kunmap(page);
SetPageUptodate(page);

And a more general question:

If I am setting two variables in sequence and it is essential that if a
different cpu reads those variables it seems them updated in the same
order as they were written in the C code do I need a write barrier in
between the two?  For example:

ntfs_inode->allocated_size = 10;
ntfs_inode->initilized_size = 10;

Should another CPU see initialized_size = 10 but allocated_size < 10 the
ntfs driver will blow up in some places.  So does that mean I need a
write barrier, between the two?

If yes, do I still need it if I wrap the two settings (and all accesses)
with a spin lock?  And in particular with a rw-spinlock?  For example:

write_lock_irqsave(&ntfs_inode->size_lock, flags);
ntfs_inode->allocated_size = 10;
ntfs_inode->initilized_size = 10;
write_unlock_irqrestore(&ntfs_inode->size_lock, flags);

Do I still need a write barrier or does the spinlock imply it already?

Thanks a lot in advance and apologies for the stupid(?) questions...

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write barriers - Was: Re: [RFC][PATCH] problem of cont_prepare_write()
  2005-01-13 14:47           ` write barriers - Was: " Anton Altaparmakov
@ 2005-01-13 15:56             ` Chris Friesen
  2005-01-13 18:08               ` Florian Weimer
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Friesen @ 2005-01-13 15:56 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: Andrew Morton, hirofumi, Linus Torvalds, lkml

Anton Altaparmakov wrote:

> If I am setting two variables in sequence and it is essential that if a
> different cpu reads those variables it seems them updated in the same
> order as they were written in the C code do I need a write barrier in
> between the two?  For example:
> 
> ntfs_inode->allocated_size = 10;
> ntfs_inode->initilized_size = 10;

I believe so.  You may also need to cast them as volatile to prevent the 
compiler from reordering--can someone with more gcc knowledge than I 
state definitively whether or not it is smart enough to not reorder 
barriers?

> Should another CPU see initialized_size = 10 but allocated_size < 10 the
> ntfs driver will blow up in some places.  So does that mean I need a
> write barrier, between the two?

As above.

You may also need a read barrier to ensure that they are not 
speculatively loaded in the wrong order--could someone more knowledgable 
than I comment on that?

> If yes, do I still need it if I wrap the two settings (and all accesses)
> with a spin lock?  And in particular with a rw-spinlock?  For example:
> 
> write_lock_irqsave(&ntfs_inode->size_lock, flags);
> ntfs_inode->allocated_size = 10;
> ntfs_inode->initilized_size = 10;
> write_unlock_irqrestore(&ntfs_inode->size_lock, flags);
> 
> Do I still need a write barrier or does the spinlock imply it already?

I believe the spinlock implies it.

> Thanks a lot in advance and apologies for the stupid(?) questions...

Not stupid.  Concurrency is hard.

Chris

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: write barriers - Was: Re: [RFC][PATCH] problem of cont_prepare_write()
  2005-01-13 15:56             ` Chris Friesen
@ 2005-01-13 18:08               ` Florian Weimer
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Weimer @ 2005-01-13 18:08 UTC (permalink / raw)
  To: Chris Friesen; +Cc: lkml

* Chris Friesen:

> I believe so.  You may also need to cast them as volatile to prevent the 
> compiler from reordering--can someone with more gcc knowledge than I 
> state definitively whether or not it is smart enough to not reorder 
> barriers?

If you define it properly, GCC won't reorder it (volatile __asm__ with
memory operands should be sufficient).

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-01-13 18:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-22 10:30 [RFC][PATCH] problem of cont_prepare_write() OGAWA Hirofumi
2004-11-22 10:41 ` Andrew Morton
2004-11-22 10:46 ` Andrew Morton
2004-11-22 11:03   ` Anton Altaparmakov
2004-11-22 21:53     ` Andrew Morton
2004-11-22 23:30       ` Anton Altaparmakov
2004-11-22 23:43         ` Andrew Morton
2005-01-13 14:47           ` write barriers - Was: " Anton Altaparmakov
2005-01-13 15:56             ` Chris Friesen
2005-01-13 18:08               ` Florian Weimer
2004-11-22 11:38   ` OGAWA Hirofumi
2004-11-22 21:43     ` Andrew Morton
2004-11-23  2:22       ` OGAWA Hirofumi
2004-11-23  2:30         ` Andrew Morton
2004-11-23  2:55           ` OGAWA Hirofumi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).