All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -V2 1/4] ext4: Check for only delay or unwritten buffer_heads
@ 2009-06-04  6:10 Aneesh Kumar K.V
  2009-06-04  6:10 ` [PATCH -V2 2/4] ext4: Add generic writepage callback Aneesh Kumar K.V
  2009-06-04 13:16 ` [PATCH -V2 1/4] ext4: Check for only delay or unwritten buffer_heads Theodore Tso
  0 siblings, 2 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2009-06-04  6:10 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V

Even with changes to make pages writeprotech on truncate/i_size update we
can still see buffer_heads which are not mapped in the writepage
callback. Consider the below case.

1) truncate(f, 1024)
2) mmap(f, 0, 4096)
3) a[0] = 'a'
4) truncate(f, 4096)
5) writepage(...)

Now if we get a writepage callback immediately after (4) and before an
attempt to write at any other offset via mmap address (which implies we
are yet to get a pagefault and do a get_block) what we would have is the
page which is dirty have first block allocated and the other three
buffer_heads unmapped.

In the above case the writepage should go ahead and try to write
the first blocks and clear the page_dirty flag. Because the further
attempt to write to the page will again create a fault and result in
allocating blocks and marking page dirty. Also if we don't write
any other offset via mmap address we would still have written the first
block to the disk and rest of the space will be considered as a hole.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |   21 +++++++--------------
 1 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b7fd336..6b61ee5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2225,15 +2225,9 @@ static void mpage_add_bh_to_extent(struct mpage_da_data *mpd,
 	return;
 }
 
-static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
+static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh)
 {
-	/*
-	 * unmapped buffer is possible for holes.
-	 * delay buffer is possible with delayed allocation.
-	 * We also need to consider unwritten buffer as unmapped.
-	 */
-	return (!buffer_mapped(bh) || buffer_delay(bh) ||
-				buffer_unwritten(bh)) && buffer_dirty(bh);
+	return (buffer_delay(bh) || buffer_unwritten(bh)) && buffer_dirty(bh);
 }
 
 /*
@@ -2320,7 +2314,7 @@ static int __mpage_da_writepage(struct page *page,
 			 * Otherwise we won't make progress
 			 * with the page in ext4_da_writepage
 			 */
-			if (ext4_bh_unmapped_or_delay(NULL, bh)) {
+			if (ext4_bh_delay_or_unwritten(NULL, bh)) {
 				mpage_add_bh_to_extent(mpd, logical,
 						       bh->b_size,
 						       bh->b_state);
@@ -2437,7 +2431,6 @@ static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
 	 * so call get_block_wrap with create = 0
 	 */
 	ret = ext4_get_blocks(NULL, inode, iblock, max_blocks, bh_result, 0);
-	BUG_ON(create && ret == 0);
 	if (ret > 0) {
 		bh_result->b_size = (ret << inode->i_blkbits);
 		ret = 0;
@@ -2473,7 +2466,7 @@ static int ext4_da_writepage(struct page *page,
 	if (page_has_buffers(page)) {
 		page_bufs = page_buffers(page);
 		if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
-					ext4_bh_unmapped_or_delay)) {
+					ext4_bh_delay_or_unwritten)) {
 			/*
 			 * We don't want to do  block allocation
 			 * So redirty the page and return
@@ -2506,7 +2499,7 @@ static int ext4_da_writepage(struct page *page,
 			page_bufs = page_buffers(page);
 			/* check whether all are mapped and non delay */
 			if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
-						ext4_bh_unmapped_or_delay)) {
+						ext4_bh_delay_or_unwritten)) {
 				redirty_page_for_writepage(wbc, page);
 				unlock_page(page);
 				return 0;
@@ -3182,7 +3175,7 @@ static int ext4_normal_writepage(struct page *page,
 		 * happily proceed with mapping them and writing the page.
 		 */
 		BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
-					ext4_bh_unmapped_or_delay));
+					ext4_bh_delay_or_unwritten));
 	}
 
 	if (!ext4_journal_current_handle())
@@ -3274,7 +3267,7 @@ static int ext4_journalled_writepage(struct page *page,
 		 * happily proceed with mapping them and writing the page.
 		 */
 		BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
-					ext4_bh_unmapped_or_delay));
+					ext4_bh_delay_or_unwritten));
 	}
 
 	if (ext4_journal_current_handle())
-- 
1.6.3.1.244.gf9275


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -V2 2/4] ext4: Add generic writepage callback
  2009-06-04  6:10 [PATCH -V2 1/4] ext4: Check for only delay or unwritten buffer_heads Aneesh Kumar K.V
@ 2009-06-04  6:10 ` Aneesh Kumar K.V
  2009-06-04  6:10   ` [PATCH -V2 3/4] ext4: Move some static functions around Aneesh Kumar K.V
                     ` (2 more replies)
  2009-06-04 13:16 ` [PATCH -V2 1/4] ext4: Check for only delay or unwritten buffer_heads Theodore Tso
  1 sibling, 3 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2009-06-04  6:10 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V

Even with changes to make pages writeprotect on truncate/i_size update we
can still see buffer_heads which are not mapped in the writepage
callback. Consider the below case.

1) truncate(f, 1024)
2) mmap(f, 0, 4096)
3) a[0] = 'a'
4) truncate(f, 4096)
5) writepage(...)

Now if we get a writepage callback immediately after (4) and before an
attempt to write at any other offset via mmap address (which implies we
are yet to get a pagefault and do a get_block) what we would have is the
page which is dirty have first block allocated and the other three
buffer_heads unmapped.

In the above case the writepage should go ahead and try to write
the first blocks and clear the page_dirty flag. Because the further
attempt to write to the page will again create a fault and result in
allocating blocks and marking page dirty. Also if we don't write
any other offset via mmap address we would still have written the first
block to the disk and rest of the space will be considered as a hole.

Delayed allocation writepage callback already did most of these. So
extend it to allow block_write_full_page on pages with unmapped buffer_heads.
We pass noalloc_get_block_write as the call back which ensures we don't
do any block allocation in writepage callback. That is need because of
the locking order between page lock and journal_start.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |  234 +++++++++++++------------------------------------------
 1 files changed, 53 insertions(+), 181 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6b61ee5..7cfacff 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2312,7 +2312,7 @@ static int __mpage_da_writepage(struct page *page,
 			 * We need to try to allocate
 			 * unmapped blocks in the same page.
 			 * Otherwise we won't make progress
-			 * with the page in ext4_da_writepage
+			 * with the page in ext4_writepage
 			 */
 			if (ext4_bh_delay_or_unwritten(NULL, bh)) {
 				mpage_add_bh_to_extent(mpd, logical,
@@ -2439,13 +2439,47 @@ static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
 }
 
 /*
+ * Note that we don't need to start a transaction unless we're journaling data
+ * because we should have holes filled from ext4_page_mkwrite(). We even don't
+ * need to file the inode to the transaction's list in ordered mode because if
+ * we are writing back data added by write(), the inode is already there and if
+ * we are writing back data modified via mmap(), noone guarantees in which
+ * transaction the data will hit the disk. In case we are journaling data, we
+ * cannot start transaction directly because transaction start ranks above page
+ * lock so we have to do some magic.
+ *
  * This function can get called via...
  *   - ext4_da_writepages after taking page lock (have journal handle)
  *   - journal_submit_inode_data_buffers (no journal handle)
  *   - shrink_page_list via pdflush (no journal handle)
  *   - grab_page_cache when doing write_begin (have journal handle)
+ *
+ * We don't do any block allocation in this function. If we have page with
+ * multiple blocks we need to write those buffer_heads that are mapped. This
+ * is important for mmaped based write. So if we do with blocksize 1K
+ * truncate(f, 1024);
+ * a = mmap(f, 0, 4096);
+ * a[0] = 'a';
+ * truncate(f, 4096);
+ * we have in the page first buffer_head mapped via page_mkwrite call back
+ * but other bufer_heads would be unmapped but dirty(dirty done via the
+ * do_wp_page). So writepage should write the first block. If we modify
+ * the mmap area beyond 1024 we will again get a page_fault and the
+ * page_mkwrite callback will do the block allocation and mark the
+ * buffer_heads mapped.
+ *
+ * We redirty the page if we have any buffer_heads that is either delay or
+ * unwritten in the page.
+ *
+ * We can get recursively called as show below.
+ *
+ *	ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
+ *		ext4_writepage()
+ *
+ * But since we don't do any block allocation we should not deadlock.
+ * Page also have the dirty flag cleared so we don't get recurive page_lock.
  */
-static int ext4_da_writepage(struct page *page,
+static int ext4_writepage(struct page *page,
 				struct writeback_control *wbc)
 {
 	int ret = 0;
@@ -2454,7 +2488,7 @@ static int ext4_da_writepage(struct page *page,
 	struct buffer_head *page_bufs;
 	struct inode *inode = page->mapping->host;
 
-	trace_mark(ext4_da_writepage,
+	trace_mark(ext4_writepage,
 		   "dev %s ino %lu page_index %lu",
 		   inode->i_sb->s_id, inode->i_ino, page->index);
 	size = i_size_read(inode);
@@ -2518,6 +2552,15 @@ static int ext4_da_writepage(struct page *page,
 		block_commit_write(page, 0, len);
 	}
 
+	if (PageChecked(page) && ext4_should_journal_data(inode)) {
+		/*
+		 * It's mmapped pagecache.  Add buffers and journal it.  There
+		 * doesn't seem much point in redirtying the page here.
+		 */
+		ClearPageChecked(page);
+		return __ext4_journalled_writepage(page, wbc, len);
+	}
+
 	if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
 		ret = nobh_writepage(page, noalloc_get_block_write, wbc);
 	else
@@ -3083,114 +3126,10 @@ static int bput_one(handle_t *handle, struct buffer_head *bh)
 	return 0;
 }
 
-/*
- * Note that we don't need to start a transaction unless we're journaling data
- * because we should have holes filled from ext4_page_mkwrite(). We even don't
- * need to file the inode to the transaction's list in ordered mode because if
- * we are writing back data added by write(), the inode is already there and if
- * we are writing back data modified via mmap(), noone guarantees in which
- * transaction the data will hit the disk. In case we are journaling data, we
- * cannot start transaction directly because transaction start ranks above page
- * lock so we have to do some magic.
- *
- * In all journaling modes block_write_full_page() will start the I/O.
- *
- * Problem:
- *
- *	ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
- *		ext4_writepage()
- *
- * Similar for:
- *
- *	ext4_file_write() -> generic_file_write() -> __alloc_pages() -> ...
- *
- * Same applies to ext4_get_block().  We will deadlock on various things like
- * lock_journal and i_data_sem
- *
- * Setting PF_MEMALLOC here doesn't work - too many internal memory
- * allocations fail.
- *
- * 16May01: If we're reentered then journal_current_handle() will be
- *	    non-zero. We simply *return*.
- *
- * 1 July 2001: @@@ FIXME:
- *   In journalled data mode, a data buffer may be metadata against the
- *   current transaction.  But the same file is part of a shared mapping
- *   and someone does a writepage() on it.
- *
- *   We will move the buffer onto the async_data list, but *after* it has
- *   been dirtied. So there's a small window where we have dirty data on
- *   BJ_Metadata.
- *
- *   Note that this only applies to the last partial page in the file.  The
- *   bit which block_write_full_page() uses prepare/commit for.  (That's
- *   broken code anyway: it's wrong for msync()).
- *
- *   It's a rare case: affects the final partial page, for journalled data
- *   where the file is subject to bith write() and writepage() in the same
- *   transction.  To fix it we'll need a custom block_write_full_page().
- *   We'll probably need that anyway for journalling writepage() output.
- *
- * We don't honour synchronous mounts for writepage().  That would be
- * disastrous.  Any write() or metadata operation will sync the fs for
- * us.
- *
- */
-static int __ext4_normal_writepage(struct page *page,
-				struct writeback_control *wbc)
-{
-	struct inode *inode = page->mapping->host;
-
-	if (test_opt(inode->i_sb, NOBH))
-		return nobh_writepage(page, noalloc_get_block_write, wbc);
-	else
-		return block_write_full_page(page, noalloc_get_block_write,
-					     wbc);
-}
-
-static int ext4_normal_writepage(struct page *page,
-				struct writeback_control *wbc)
-{
-	struct inode *inode = page->mapping->host;
-	loff_t size = i_size_read(inode);
-	loff_t len;
-
-	trace_mark(ext4_normal_writepage,
-		   "dev %s ino %lu page_index %lu",
-		   inode->i_sb->s_id, inode->i_ino, page->index);
-	J_ASSERT(PageLocked(page));
-	if (page->index == size >> PAGE_CACHE_SHIFT)
-		len = size & ~PAGE_CACHE_MASK;
-	else
-		len = PAGE_CACHE_SIZE;
-
-	if (page_has_buffers(page)) {
-		/* if page has buffers it should all be mapped
-		 * and allocated. If there are not buffers attached
-		 * to the page we know the page is dirty but it lost
-		 * buffers. That means that at some moment in time
-		 * after write_begin() / write_end() has been called
-		 * all buffers have been clean and thus they must have been
-		 * written at least once. So they are all mapped and we can
-		 * happily proceed with mapping them and writing the page.
-		 */
-		BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
-					ext4_bh_delay_or_unwritten));
-	}
-
-	if (!ext4_journal_current_handle())
-		return __ext4_normal_writepage(page, wbc);
-
-	redirty_page_for_writepage(wbc, page);
-	unlock_page(page);
-	return 0;
-}
-
 static int __ext4_journalled_writepage(struct page *page,
-				struct writeback_control *wbc)
+					struct writeback_control *wbc,
+					unsigned int len)
 {
-	loff_t size;
-	unsigned int len;
 	struct address_space *mapping = page->mapping;
 	struct inode *inode = mapping->host;
 	struct buffer_head *page_bufs;
@@ -3198,16 +3137,8 @@ static int __ext4_journalled_writepage(struct page *page,
 	int ret = 0;
 	int err;
 
-	size = i_size_read(inode);
-	if (page->index == size >> PAGE_CACHE_SHIFT)
-		len = size & ~PAGE_CACHE_MASK;
-	else
-		len = PAGE_CACHE_SIZE;
-	ret = block_prepare_write(page, 0, len, noalloc_get_block_write);
-	if (ret != 0)
-		goto out_unlock;
-
 	page_bufs = page_buffers(page);
+	BUG_ON(!page_bufs);
 	walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
 	/* As soon as we unlock the page, it can go away, but we have
 	 * references to buffers so we are safe */
@@ -3232,69 +3163,10 @@ static int __ext4_journalled_writepage(struct page *page,
 
 	walk_page_buffers(handle, page_bufs, 0, len, NULL, bput_one);
 	EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
-	goto out;
-
-out_unlock:
-	unlock_page(page);
 out:
 	return ret;
 }
 
-static int ext4_journalled_writepage(struct page *page,
-				struct writeback_control *wbc)
-{
-	struct inode *inode = page->mapping->host;
-	loff_t size = i_size_read(inode);
-	loff_t len;
-
-	trace_mark(ext4_journalled_writepage,
-		   "dev %s ino %lu page_index %lu",
-		   inode->i_sb->s_id, inode->i_ino, page->index);
-	J_ASSERT(PageLocked(page));
-	if (page->index == size >> PAGE_CACHE_SHIFT)
-		len = size & ~PAGE_CACHE_MASK;
-	else
-		len = PAGE_CACHE_SIZE;
-
-	if (page_has_buffers(page)) {
-		/* if page has buffers it should all be mapped
-		 * and allocated. If there are not buffers attached
-		 * to the page we know the page is dirty but it lost
-		 * buffers. That means that at some moment in time
-		 * after write_begin() / write_end() has been called
-		 * all buffers have been clean and thus they must have been
-		 * written at least once. So they are all mapped and we can
-		 * happily proceed with mapping them and writing the page.
-		 */
-		BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
-					ext4_bh_delay_or_unwritten));
-	}
-
-	if (ext4_journal_current_handle())
-		goto no_write;
-
-	if (PageChecked(page)) {
-		/*
-		 * It's mmapped pagecache.  Add buffers and journal it.  There
-		 * doesn't seem much point in redirtying the page here.
-		 */
-		ClearPageChecked(page);
-		return __ext4_journalled_writepage(page, wbc);
-	} else {
-		/*
-		 * It may be a page full of checkpoint-mode buffers.  We don't
-		 * really know unless we go poke around in the buffer_heads.
-		 * But block_write_full_page will do the right thing.
-		 */
-		return block_write_full_page(page, noalloc_get_block_write,
-					     wbc);
-	}
-no_write:
-	redirty_page_for_writepage(wbc, page);
-	unlock_page(page);
-	return 0;
-}
-
 static int ext4_readpage(struct file *file, struct page *page)
 {
 	return mpage_readpage(page, ext4_get_block);
@@ -3441,7 +3313,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
 static const struct address_space_operations ext4_ordered_aops = {
 	.readpage		= ext4_readpage,
 	.readpages		= ext4_readpages,
-	.writepage		= ext4_normal_writepage,
+	.writepage		= ext4_writepage,
 	.sync_page		= block_sync_page,
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_ordered_write_end,
@@ -3456,7 +3328,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
 static const struct address_space_operations ext4_writeback_aops = {
 	.readpage		= ext4_readpage,
 	.readpages		= ext4_readpages,
-	.writepage		= ext4_normal_writepage,
+	.writepage		= ext4_writepage,
 	.sync_page		= block_sync_page,
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_writeback_write_end,
@@ -3471,7 +3343,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
 static const struct address_space_operations ext4_journalled_aops = {
 	.readpage		= ext4_readpage,
 	.readpages		= ext4_readpages,
-	.writepage		= ext4_journalled_writepage,
+	.writepage		= ext4_writepage,
 	.sync_page		= block_sync_page,
 	.write_begin		= ext4_write_begin,
 	.write_end		= ext4_journalled_write_end,
@@ -3485,7 +3357,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
 static const struct address_space_operations ext4_da_aops = {
 	.readpage		= ext4_readpage,
 	.readpages		= ext4_readpages,
-	.writepage		= ext4_da_writepage,
+	.writepage		= ext4_writepage,
 	.writepages		= ext4_da_writepages,
 	.sync_page		= block_sync_page,
 	.write_begin		= ext4_da_write_begin,
-- 
1.6.3.1.244.gf9275


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -V2 3/4] ext4: Move some static functions around
  2009-06-04  6:10 ` [PATCH -V2 2/4] ext4: Add generic writepage callback Aneesh Kumar K.V
@ 2009-06-04  6:10   ` Aneesh Kumar K.V
  2009-06-04  6:10     ` [PATCH -V2 4/4] ext4: Add WARN_ON on unmapped dirty buffer_heads in writepage Aneesh Kumar K.V
  2009-06-04 13:47     ` [PATCH -V2 3/4] ext4: Move some static functions around Theodore Tso
  2009-06-04 12:05   ` [PATCH -V2 2/4] ext4: Add generic writepage callback Theodore Tso
  2009-06-04 13:20   ` Theodore Tso
  2 siblings, 2 replies; 9+ messages in thread
From: Aneesh Kumar K.V @ 2009-06-04  6:10 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V

Move some static functions around so that we can
avoid some forward declaration. Also fix some compile
warnings

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/ext4/inode.c |  110 +++++++++++++++++++++++++++----------------------------
 fs/ext4/namei.c |    2 +-
 2 files changed, 55 insertions(+), 57 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7cfacff..2219daa 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -933,11 +933,8 @@ static int ext4_ind_get_blocks(handle_t *handle, struct inode *inode,
 	int indirect_blks;
 	int blocks_to_boundary = 0;
 	int depth;
-	struct ext4_inode_info *ei = EXT4_I(inode);
 	int count = 0;
 	ext4_fsblk_t first_block = 0;
-	loff_t disksize;
-
 
 	J_ASSERT(!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL));
 	J_ASSERT(handle != NULL || (flags & EXT4_GET_BLOCKS_CREATE) == 0);
@@ -2438,6 +2435,60 @@ static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
 	return ret;
 }
 
+static int bget_one(handle_t *handle, struct buffer_head *bh)
+{
+	get_bh(bh);
+	return 0;
+}
+
+static int bput_one(handle_t *handle, struct buffer_head *bh)
+{
+	put_bh(bh);
+	return 0;
+}
+
+static int __ext4_journalled_writepage(struct page *page,
+					struct writeback_control *wbc,
+					unsigned int len)
+{
+	struct address_space *mapping = page->mapping;
+	struct inode *inode = mapping->host;
+	struct buffer_head *page_bufs;
+	handle_t *handle = NULL;
+	int ret = 0;
+	int err;
+
+	page_bufs = page_buffers(page);
+	BUG_ON(!page_bufs);
+	walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
+	/* As soon as we unlock the page, it can go away, but we have
+	 * references to buffers so we are safe */
+	unlock_page(page);
+
+	handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		goto out;
+	}
+
+	ret = walk_page_buffers(handle, page_bufs, 0, len, NULL,
+				do_journal_get_write_access);
+
+	err = walk_page_buffers(handle, page_bufs, 0, len, NULL,
+				write_end_fn);
+	if (ret == 0)
+		ret = err;
+	err = ext4_journal_stop(handle);
+	if (!ret)
+		ret = err;
+
+	walk_page_buffers(handle, page_bufs, 0, len, NULL, bput_one);
+	EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
+out:
+	return ret;
+}
+
+
 /*
  * Note that we don't need to start a transaction unless we're journaling data
  * because we should have holes filled from ext4_page_mkwrite(). We even don't
@@ -3114,59 +3165,6 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
 	return generic_block_bmap(mapping, block, ext4_get_block);
 }
 
-static int bget_one(handle_t *handle, struct buffer_head *bh)
-{
-	get_bh(bh);
-	return 0;
-}
-
-static int bput_one(handle_t *handle, struct buffer_head *bh)
-{
-	put_bh(bh);
-	return 0;
-}
-
-static int __ext4_journalled_writepage(struct page *page,
-					struct writeback_control *wbc,
-					unsigned int len)
-{
-	struct address_space *mapping = page->mapping;
-	struct inode *inode = mapping->host;
-	struct buffer_head *page_bufs;
-	handle_t *handle = NULL;
-	int ret = 0;
-	int err;
-
-	page_bufs = page_buffers(page);
-	BUG_ON(!page_bufs);
-	walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
-	/* As soon as we unlock the page, it can go away, but we have
-	 * references to buffers so we are safe */
-	unlock_page(page);
-
-	handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
-	if (IS_ERR(handle)) {
-		ret = PTR_ERR(handle);
-		goto out;
-	}
-
-	ret = walk_page_buffers(handle, page_bufs, 0, len, NULL,
-				do_journal_get_write_access);
-
-	err = walk_page_buffers(handle, page_bufs, 0, len, NULL,
-				write_end_fn);
-	if (ret == 0)
-		ret = err;
-	err = ext4_journal_stop(handle);
-	if (!ret)
-		ret = err;
-
-	walk_page_buffers(handle, page_bufs, 0, len, NULL, bput_one);
-	EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
-out:
-	return ret;
-}

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -V2 4/4] ext4: Add WARN_ON on unmapped dirty buffer_heads in writepage
  2009-06-04  6:10   ` [PATCH -V2 3/4] ext4: Move some static functions around Aneesh Kumar K.V
@ 2009-06-04  6:10     ` Aneesh Kumar K.V
  2009-06-04 13:50       ` Theodore Tso
  2009-06-04 13:47     ` [PATCH -V2 3/4] ext4: Move some static functions around Theodore Tso
  1 sibling, 1 reply; 9+ messages in thread
From: Aneesh Kumar K.V @ 2009-06-04  6:10 UTC (permalink / raw)
  To: cmm, tytso, sandeen; +Cc: linux-ext4, Aneesh Kumar K.V

Now we have block_lock_hole_extend clearing the dirty flag of
buffer_heads outside i_size we should not find buffer_heads
which are unmapped and dirty in writepage. If we find do a WARN_ON.
We can still continue because block_write_full page look at the mapped
flag only.

Following sequence of events would result in the above condition.
1) truncate(f, 1024)
2) mmap(f, 0, 4096)
3) a[0] = 'a'
4) truncate(f, 4096)
5) writepage(...)

After step 3 we would have unmapped buffer_heads outside i_size.
After step 4 we would have unmapped buffer_heads within i_size.

Now that truncate is calling block_lock_hole_extend which in turn
is clearing the dirty flag, we can safely assume that we won't
find unmapped dirty buffer_heads in write page. If we did find one
we should find out why.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2219daa..9bba474 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2488,6 +2488,10 @@ static int __ext4_journalled_writepage(struct page *page,
 	return ret;
 }
 
+static int ext4_bh_unmapped_and_dirty(handle_t *handle, struct buffer_head *bh)
+{
+	return !buffer_mapped(bh) && buffer_dirty(bh);
+}
 
 /*
  * Note that we don't need to start a transaction unless we're journaling data
@@ -2602,6 +2606,14 @@ static int ext4_writepage(struct page *page,
 		/* now mark the buffer_heads as dirty and uptodate */
 		block_commit_write(page, 0, len);
 	}
+	/*
+	 * There should not be any unmapped and dirty
+	 * buffer_heads at this point. Look at block_lock_hole_extend
+	 * for more info. If we find one print more info
+	 */
+	 WARN(walk_page_buffers(NULL, page_bufs, 0, len, NULL,
+				 ext4_bh_unmapped_and_dirty),
+		 "Unmapped dirty buffer_heads found in %s\n", __func__);
 
 	if (PageChecked(page) && ext4_should_journal_data(inode)) {
 		/*
-- 
1.6.3.1.244.gf9275


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH -V2 2/4] ext4: Add generic writepage callback
  2009-06-04  6:10 ` [PATCH -V2 2/4] ext4: Add generic writepage callback Aneesh Kumar K.V
  2009-06-04  6:10   ` [PATCH -V2 3/4] ext4: Move some static functions around Aneesh Kumar K.V
@ 2009-06-04 12:05   ` Theodore Tso
  2009-06-04 13:20   ` Theodore Tso
  2 siblings, 0 replies; 9+ messages in thread
From: Theodore Tso @ 2009-06-04 12:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, sandeen, linux-ext4

On Thu, Jun 04, 2009 at 11:40:03AM +0530, Aneesh Kumar K.V wrote:
> Even with changes to make pages writeprotect on truncate/i_size update we
> can still see buffer_heads which are not mapped in the writepage
> callback.

Which changes were you referring to here?  Do you mean Jan's patch
page_mkwrite()'s changes, or something else?


Also it seems like:

ext4: Merge ext4_{journalled,normal,da}_writepage() into ext4_writepage()

might be a better patch title, do you agree?

> 1) truncate(f, 1024)
> 2) mmap(f, 0, 4096)
> 3) a[0] = 'a'
> 4) truncate(f, 4096)
> 5) writepage(...)
> 
> Now if we get a writepage callback immediately after (4) and before an
> attempt to write at any other offset via mmap address (which implies we
> are yet to get a pagefault and do a get_block) what we would have is the
> page which is dirty have first block allocated and the other three
> buffer_heads unmapped.

The first two paragraps of this description are the same as the
previous patch; so really what's happening here is this fixes the same
issue as the previous patch, but for the the case where delayed
allocation is not enabled, is that correct?

						- Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -V2 1/4] ext4: Check for only delay or unwritten buffer_heads
  2009-06-04  6:10 [PATCH -V2 1/4] ext4: Check for only delay or unwritten buffer_heads Aneesh Kumar K.V
  2009-06-04  6:10 ` [PATCH -V2 2/4] ext4: Add generic writepage callback Aneesh Kumar K.V
@ 2009-06-04 13:16 ` Theodore Tso
  1 sibling, 0 replies; 9+ messages in thread
From: Theodore Tso @ 2009-06-04 13:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, sandeen, linux-ext4

I've added this patch to the ext4 patch queue with the folloiwng
changelog comment:

ext4: Fix mmap/truncate race when blocksize < pagesize && delayed allocation

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

It is possible to see buffer_heads which are not mapped in the
writepage callback in the following scneario (where the fs blocksize
is 1k and the page size is 4k):

1) truncate(f, 1024)
2) mmap(f, 0, 4096)
3) a[0] = 'a'
4) truncate(f, 4096)
5) writepage(...)

Now if we get a writepage callback immediately after (4) and before an
attempt to write at any other offset via mmap address (which implies we
are yet to get a pagefault and do a get_block) what we would have is the
page which is dirty have first block allocated and the other three
buffer_heads unmapped.

In the above case the writepage should go ahead and try to write the
first blocks and clear the page_dirty flag. Further attempts to write
to the page will again create a fault and result in allocating blocks
and marking page dirty.  If we don't write any other offset via mmap
address we would still have written the first block to the disk and
rest of the space will be considered as a hole.

So to address this, we change all of the places where we look for
delayed, unmapped, or unwritten buffer heads, and only check for
delayed or unwritten buffer heads instead.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

						- Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -V2 2/4] ext4: Add generic writepage callback
  2009-06-04  6:10 ` [PATCH -V2 2/4] ext4: Add generic writepage callback Aneesh Kumar K.V
  2009-06-04  6:10   ` [PATCH -V2 3/4] ext4: Move some static functions around Aneesh Kumar K.V
  2009-06-04 12:05   ` [PATCH -V2 2/4] ext4: Add generic writepage callback Theodore Tso
@ 2009-06-04 13:20   ` Theodore Tso
  2 siblings, 0 replies; 9+ messages in thread
From: Theodore Tso @ 2009-06-04 13:20 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, sandeen, linux-ext4

I've applied this to the ext4 patch queue with the following changelog
comment.

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaoc

This patch fixes the mmap/truncate race that was fixed for delayed
allocation by merging ext4_{journalled,normal,da}_writepage() into
ext4_writepage().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

I also had to apply the following fix so that the patch would compile:

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1359aa1..24c8634 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -44,6 +44,10 @@
 
 #define MPAGE_DA_EXTENT_TAIL 0x01
 
+static int __ext4_journalled_writepage(struct page *page,
+				       struct writeback_control *wbc,
+				       unsigned int len);
+
 static inline int ext4_begin_ordered_truncate(struct inode *inode,
 					      loff_t new_size)
 {


						- Ted

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH -V2 3/4] ext4: Move some static functions around
  2009-06-04  6:10   ` [PATCH -V2 3/4] ext4: Move some static functions around Aneesh Kumar K.V
  2009-06-04  6:10     ` [PATCH -V2 4/4] ext4: Add WARN_ON on unmapped dirty buffer_heads in writepage Aneesh Kumar K.V
@ 2009-06-04 13:47     ` Theodore Tso
  1 sibling, 0 replies; 9+ messages in thread
From: Theodore Tso @ 2009-06-04 13:47 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, sandeen, linux-ext4

On Thu, Jun 04, 2009 at 11:40:04AM +0530, Aneesh Kumar K.V wrote:
> Move some static functions around so that we can
> avoid some forward declaration. Also fix some compile
> warnings

I've added this with the changelog commit:

    ext4: Move __ext4_journalled_writepage() to avoid forward declaration
    
    In addition, fix two unused variable warnings.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

I also dropped the bh initialization in fs/ext4/namei.c:empty_dir(),
since it's clearly unnecessary and gcc 4.3.3 doesn't emit a warning.

      	   	   	       	       	     - Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -V2 4/4] ext4: Add WARN_ON on unmapped dirty buffer_heads in writepage
  2009-06-04  6:10     ` [PATCH -V2 4/4] ext4: Add WARN_ON on unmapped dirty buffer_heads in writepage Aneesh Kumar K.V
@ 2009-06-04 13:50       ` Theodore Tso
  0 siblings, 0 replies; 9+ messages in thread
From: Theodore Tso @ 2009-06-04 13:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: cmm, sandeen, linux-ext4, Jan Kara

On Thu, Jun 04, 2009 at 11:40:05AM +0530, Aneesh Kumar K.V wrote:
> Now we have block_lock_hole_extend clearing the dirty flag of
> buffer_heads outside i_size we should not find buffer_heads
> which are unmapped and dirty in writepage. If we find do a WARN_ON.
> We can still continue because block_write_full page look at the mapped
> flag only.

This is only true only *after* Jan's patches are applied, right?  So
that means it's only safe to apply this after Jan's patch series is
applied.  So I believe I need to queue this in the unstable portion of
the patch queue until the page_mkwrite() changes are applied;
otherwise we could have some spurious WARN_ON's.

					- Ted

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-04 13:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-04  6:10 [PATCH -V2 1/4] ext4: Check for only delay or unwritten buffer_heads Aneesh Kumar K.V
2009-06-04  6:10 ` [PATCH -V2 2/4] ext4: Add generic writepage callback Aneesh Kumar K.V
2009-06-04  6:10   ` [PATCH -V2 3/4] ext4: Move some static functions around Aneesh Kumar K.V
2009-06-04  6:10     ` [PATCH -V2 4/4] ext4: Add WARN_ON on unmapped dirty buffer_heads in writepage Aneesh Kumar K.V
2009-06-04 13:50       ` Theodore Tso
2009-06-04 13:47     ` [PATCH -V2 3/4] ext4: Move some static functions around Theodore Tso
2009-06-04 12:05   ` [PATCH -V2 2/4] ext4: Add generic writepage callback Theodore Tso
2009-06-04 13:20   ` Theodore Tso
2009-06-04 13:16 ` [PATCH -V2 1/4] ext4: Check for only delay or unwritten buffer_heads Theodore Tso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.