All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [PATCH] Zero from EOF instead of next block
@ 2011-02-14  6:21 Goldwyn Rodrigues
  2011-02-14  7:25 ` Joel Becker
  0 siblings, 1 reply; 9+ messages in thread
From: Goldwyn Rodrigues @ 2011-02-14  6:21 UTC (permalink / raw)
  To: ocfs2-devel

ocfs2_zero_extend() extends the file from the next page offset
rather from the current file size. zero_extend_page() should
call ocfs2_zero_partial_cluster() to set zero rest of the cluster
when the file size is within the page boundaries.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
---
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index a665195..5a2223e 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -63,6 +63,8 @@
 #include "refcounttree.h"

 #include "buffer_head_io.h"
+static int ocfs2_zero_partial_clusters(struct inode *inode,
+				       u64 start, u64 len);

 static int ocfs2_init_file_private(struct inode *inode, struct file *file)
 {
@@ -960,11 +962,21 @@ int ocfs2_zero_extend(struct inode *inode,
struct buffer_head *di_bh,
 	int ret = 0;
 	u64 zero_start, range_start = 0, range_end = 0;
 	struct super_block *sb = inode->i_sb;
+	u64 current_size = i_size_read(inode), len;

 	zero_start = ocfs2_align_bytes_to_blocks(sb, i_size_read(inode));
 	mlog(0, "zero_start %llu for i_size %llu\n",
 	     (unsigned long long)zero_start,
 	     (unsigned long long)i_size_read(inode));
+
+	len = ocfs2_align_bytes_to_clusters(sb, current_size) - current_size;
+	if (len) {
+		ret = ocfs2_zero_partial_clusters(inode, current_size, len);
+		if (ret)
+			goto out;
+		zero_start = ocfs2_align_bytes_to_clusters(sb, current_size);
+	}
+
 	while (zero_start < zero_to_size) {
 		ret = ocfs2_zero_extend_get_range(inode, di_bh, zero_start,
 						  zero_to_size,
@@ -990,7 +1002,7 @@ int ocfs2_zero_extend(struct inode *inode, struct
buffer_head *di_bh,
 		}
 		zero_start = range_end;
 	}
-
+out:
 	return ret;
 }


-- 
Goldwyn

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH] Zero from EOF instead of next block
  2011-02-14  6:21 [Ocfs2-devel] [PATCH] Zero from EOF instead of next block Goldwyn Rodrigues
@ 2011-02-14  7:25 ` Joel Becker
  2011-02-14 18:24   ` Goldwyn Rodrigues
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Becker @ 2011-02-14  7:25 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Feb 14, 2011 at 12:21:41AM -0600, Goldwyn Rodrigues wrote:
> ocfs2_zero_extend() extends the file from the next page offset
> rather from the current file size. zero_extend_page() should
> call ocfs2_zero_partial_cluster() to set zero rest of the cluster
> when the file size is within the page boundaries.

	We just went over this.  It is intentional that we start from
the next block.  Linux code always assumes that the block containing
i_size is properly zeroed to its end.  Our last set of fixes for this
made sure we were doing that correctly.  Or so I thought.
	Your patch description looks like you are worried that we don't
zero from EOF to End-Of-Block.  Have you proof we've failed that again?
Or are you worried that we don't zero from EOB to End-Of-Cluster?  We
used to zero to EOC, but we now know that to be a mistake.

Joel

-- 

"Sometimes one pays most for the things one gets for nothing."
        - Albert Einstein

			http://www.jlbec.org/
			jlbec at evilplan.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH] Zero from EOF instead of next block
  2011-02-14  7:25 ` Joel Becker
@ 2011-02-14 18:24   ` Goldwyn Rodrigues
  2011-02-14 22:00     ` Joel Becker
  0 siblings, 1 reply; 9+ messages in thread
From: Goldwyn Rodrigues @ 2011-02-14 18:24 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Feb 14, 2011 at 1:25 AM, Joel Becker <jlbec@evilplan.org> wrote:
> On Mon, Feb 14, 2011 at 12:21:41AM -0600, Goldwyn Rodrigues wrote:
>> ocfs2_zero_extend() extends the file from the next page offset
>> rather from the current file size. zero_extend_page() should
>> call ocfs2_zero_partial_cluster() to set zero rest of the cluster
>> when the file size is within the page boundaries.
>
> ? ? ? ?We just went over this. ?It is intentional that we start from
> the next block. ?Linux code always assumes that the block containing
> i_size is properly zeroed to its end. ?Our last set of fixes for this
> made sure we were doing that correctly. ?Or so I thought.
> ? ? ? ?Your patch description looks like you are worried that we don't
> zero from EOF to End-Of-Block. ?Have you proof we've failed that again?
> Or are you worried that we don't zero from EOB to End-Of-Cluster? ?We
> used to zero to EOC, but we now know that to be a mistake.
>

Why is zeroing to EOC a mistake?

I ran my tests again and found the problem to be from EOB to
End-of-cluster, and of course this happens only with
blocksize<clustersize. So, in ocfs2_zero_extend rewriting zero_to_size
to end-of-cluster works correctly  -

zero_to_size = ocfs2_align_bytes_to_clusters(sb, zero_to_size);

The test case is attached. Modify run_ocfs2_sparse_writing_test.sh accordingly.

-- 
Goldwyn
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ocfs2_sparse_writing_test_kit.tbz
Type: application/x-bzip-compressed-tar
Size: 2558 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20110214/0b0c8fb7/attachment.bin 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH] Zero from EOF instead of next block
  2011-02-14 18:24   ` Goldwyn Rodrigues
@ 2011-02-14 22:00     ` Joel Becker
  2011-02-14 22:02       ` Joel Becker
  2011-02-14 22:16       ` Goldwyn Rodrigues
  0 siblings, 2 replies; 9+ messages in thread
From: Joel Becker @ 2011-02-14 22:00 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Feb 14, 2011 at 12:24:01PM -0600, Goldwyn Rodrigues wrote:
> On Mon, Feb 14, 2011 at 1:25 AM, Joel Becker <jlbec@evilplan.org> wrote:
> Why is zeroing to EOC a mistake?

	Because Linux writeback trims all writes to EOB for the block
surrounding i_size.  So when you fill pages to EOC, and blocksize !=
clustersize, you end up with pages that are never flushed to disk.
This becomes a problem.  In reflink CoW, it ends up being a BUG().
	See
http://kerneltrap.org/mailarchive/linux-kernel/2010/6/28/4587750 for the
entire discussion.  The final fix patches ended going in as:

a4bfb4c ocfs2: When zero extending, do it by page.
5693486 ocfs2: Zero the tail cluster when extending past i_size.
693c241 ocfs2: No need to zero pages past i_size.

> I ran my tests again and found the problem to be from EOB to
> End-of-cluster, and of course this happens only with
> blocksize<clustersize. So, in ocfs2_zero_extend rewriting zero_to_size
> to end-of-cluster works correctly  -

	I'm going to check out your test case, but please explain what
you think is broken.

Joel

-- 

Life's Little Instruction Book #24

	"Drink champagne for no reason at all."

			http://www.jlbec.org/
			jlbec at evilplan.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH] Zero from EOF instead of next block
  2011-02-14 22:00     ` Joel Becker
@ 2011-02-14 22:02       ` Joel Becker
  2011-02-14 22:16       ` Goldwyn Rodrigues
  1 sibling, 0 replies; 9+ messages in thread
From: Joel Becker @ 2011-02-14 22:02 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Feb 14, 2011 at 02:00:22PM -0800, Joel Becker wrote:
> 	I'm going to check out your test case, but please explain what
> you think is broken.

	Argh, it's C code.  I don't have time today.  Please see my
testcase in ocfs2-test:

http://oss.oracle.com/git/?p=ocfs2-test.git;a=commit;h=271a35defdd7101fba300343f5784b3255cb7b64

Let me know what I did wrong ;-)

Joel

-- 

"The nice thing about egotists is that they don't talk about other
 people."
         - Lucille S. Harper

			http://www.jlbec.org/
			jlbec at evilplan.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH] Zero from EOF instead of next block
  2011-02-14 22:00     ` Joel Becker
  2011-02-14 22:02       ` Joel Becker
@ 2011-02-14 22:16       ` Goldwyn Rodrigues
  2011-02-14 22:25         ` Joel Becker
  1 sibling, 1 reply; 9+ messages in thread
From: Goldwyn Rodrigues @ 2011-02-14 22:16 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Feb 14, 2011 at 4:00 PM, Joel Becker <jlbec@evilplan.org> wrote:
> On Mon, Feb 14, 2011 at 12:24:01PM -0600, Goldwyn Rodrigues wrote:
>> On Mon, Feb 14, 2011 at 1:25 AM, Joel Becker <jlbec@evilplan.org> wrote:
>> Why is zeroing to EOC a mistake?
>
> ? ? ? ?Because Linux writeback trims all writes to EOB for the block
> surrounding i_size. ?So when you fill pages to EOC, and blocksize !=
> clustersize, you end up with pages that are never flushed to disk.
> This becomes a problem. ?In reflink CoW, it ends up being a BUG().
> ? ? ? ?See
> http://kerneltrap.org/mailarchive/linux-kernel/2010/6/28/4587750 for the
> entire discussion. ?The final fix patches ended going in as:
>
> a4bfb4c ocfs2: When zero extending, do it by page.
> 5693486 ocfs2: Zero the tail cluster when extending past i_size.
> 693c241 ocfs2: No need to zero pages past i_size.
>
>> I ran my tests again and found the problem to be from EOB to
>> End-of-cluster, and of course this happens only with
>> blocksize<clustersize. So, in ocfs2_zero_extend rewriting zero_to_size
>> to end-of-cluster works correctly ?-
>
> ? ? ? ?I'm going to check out your test case, but please explain what
> you think is broken.
>

Oh, I thought the test case was self-explanatory. The test case first writes
a lot of dirty data on disk "0xbaadfeed", syncs the data and removes
the file so that the filesystem has a lot of "0xbaadfeed" data on it.

The second executable writes records of 32 bytes and seeks another 32
bytes. While reading the file back, 0xbaadfeed is read when zeros are
expected. The program shows where the unexpected data is found.

So, for holes not spanning extents or even blocks, the data which was
on disk previously shows up.

Let me know if you need further explanation.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH] Zero from EOF instead of next block
  2011-02-14 22:16       ` Goldwyn Rodrigues
@ 2011-02-14 22:25         ` Joel Becker
  2011-02-16  6:26           ` Goldwyn Rodrigues
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Becker @ 2011-02-14 22:25 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Feb 14, 2011 at 04:16:37PM -0600, Goldwyn Rodrigues wrote:
> Oh, I thought the test case was self-explanatory. The test case first writes
> a lot of dirty data on disk "0xbaadfeed", syncs the data and removes
> the file so that the filesystem has a lot of "0xbaadfeed" data on it.
> 
> The second executable writes records of 32 bytes and seeks another 32
> bytes. While reading the file back, 0xbaadfeed is read when zeros are
> expected. The program shows where the unexpected data is found.
> 
> So, for holes not spanning extents or even blocks, the data which was
> on disk previously shows up.

	That shouldn't happen, of course.  But the solution will not be
writing past the last block of the file.  We should be catching it
instead when extending the i_size into a previously uninitialized block.

Joel

-- 

"One of the symptoms of an approaching nervous breakdown is the
 belief that one's work is terribly important."
         - Bertrand Russell 

			http://www.jlbec.org/
			jlbec at evilplan.org

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH] Zero from EOF instead of next block
  2011-02-14 22:25         ` Joel Becker
@ 2011-02-16  6:26           ` Goldwyn Rodrigues
  2011-02-16 14:24             ` Goldwyn Rodrigues
  0 siblings, 1 reply; 9+ messages in thread
From: Goldwyn Rodrigues @ 2011-02-16  6:26 UTC (permalink / raw)
  To: ocfs2-devel

On Mon, Feb 14, 2011 at 4:25 PM, Joel Becker <jlbec@evilplan.org> wrote:
> On Mon, Feb 14, 2011 at 04:16:37PM -0600, Goldwyn Rodrigues wrote:
>> Oh, I thought the test case was self-explanatory. The test case first writes
>> a lot of dirty data on disk "0xbaadfeed", syncs the data and removes
>> the file so that the filesystem has a lot of "0xbaadfeed" data on it.
>>
>> The second executable writes records of 32 bytes and seeks another 32
>> bytes. While reading the file back, 0xbaadfeed is read when zeros are
>> expected. The program shows where the unexpected data is found.
>>
>> So, for holes not spanning extents or even blocks, the data which was
>> on disk previously shows up.
>
> ? ? ? ?That shouldn't happen, of course. ?But the solution will not be
> writing past the last block of the file. ?We should be catching it
> instead when extending the i_size into a previously uninitialized block.
>

The problem occurs when the last few bytes of the previous page is a
hole, and the write starts from the beginning of next page. This
causes the page to be read in ocfs2_map_page_blocks(), and since this
is a new block, it picks up garbage from the disk. So, the best way
seems to be enforce it to be a new page when writing from block
boundary by testing against pos.

I tested this patch. However, could you check if this would affect
anything else?

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 1fbb0e2..3dcca2a 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1026,6 +1026,8 @@ static int ocfs2_prepare_page_for_write(struct
inode *inode, u64 *p_blkno,
        ocfs2_figure_cluster_boundaries(OCFS2_SB(inode->i_sb), cpos,
                                        &cluster_start, &cluster_end);

+       new = new | (page_offset(page) == user_pos);
+
        if (page == wc->w_target_page) {
                map_from = user_pos & (PAGE_CACHE_SIZE - 1);
                map_to = map_from + user_len;





-- 
Goldwyn

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Ocfs2-devel] [PATCH] Zero from EOF instead of next block
  2011-02-16  6:26           ` Goldwyn Rodrigues
@ 2011-02-16 14:24             ` Goldwyn Rodrigues
  0 siblings, 0 replies; 9+ messages in thread
From: Goldwyn Rodrigues @ 2011-02-16 14:24 UTC (permalink / raw)
  To: ocfs2-devel

On Wed, Feb 16, 2011 at 12:26 AM, Goldwyn Rodrigues <rgoldwyn@gmail.com> wrote:
> On Mon, Feb 14, 2011 at 4:25 PM, Joel Becker <jlbec@evilplan.org> wrote:
>> On Mon, Feb 14, 2011 at 04:16:37PM -0600, Goldwyn Rodrigues wrote:
>>> Oh, I thought the test case was self-explanatory. The test case first writes
>>> a lot of dirty data on disk "0xbaadfeed", syncs the data and removes
>>> the file so that the filesystem has a lot of "0xbaadfeed" data on it.
>>>
>>> The second executable writes records of 32 bytes and seeks another 32
>>> bytes. While reading the file back, 0xbaadfeed is read when zeros are
>>> expected. The program shows where the unexpected data is found.
>>>
>>> So, for holes not spanning extents or even blocks, the data which was
>>> on disk previously shows up.
>>
>> ? ? ? ?That shouldn't happen, of course. ?But the solution will not be
>> writing past the last block of the file. ?We should be catching it
>> instead when extending the i_size into a previously uninitialized block.
>>
>
> The problem occurs when the last few bytes of the previous page is a
> hole, and the write starts from the beginning of next page. This
> causes the page to be read in ocfs2_map_page_blocks(), and since this
> is a new block, it picks up garbage from the disk. So, the best way
> seems to be enforce it to be a new page when writing from block
> boundary by testing against pos.
>
> I tested this patch. However, could you check if this would affect
> anything else?
>

Of course that is incorrect. It has to be dependent on the file size
as well. The corrected patch is:

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 1fbb0e2..10199e7 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1026,6 +1026,9 @@ static int ocfs2_prepare_page_for_write(struct
inode *inode, u64 *p_blkno,
        ocfs2_figure_cluster_boundaries(OCFS2_SB(inode->i_sb), cpos,
                                        &cluster_start, &cluster_end);

+       new = new | ((page_offset(page) == user_pos)
+                       && (page_offset(page) > i_size_read(inode)));
+
        if (page == wc->w_target_page) {
                map_from = user_pos & (PAGE_CACHE_SIZE - 1);
                map_to = map_from + user_len;



-- 
Goldwyn

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-02-16 14:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-14  6:21 [Ocfs2-devel] [PATCH] Zero from EOF instead of next block Goldwyn Rodrigues
2011-02-14  7:25 ` Joel Becker
2011-02-14 18:24   ` Goldwyn Rodrigues
2011-02-14 22:00     ` Joel Becker
2011-02-14 22:02       ` Joel Becker
2011-02-14 22:16       ` Goldwyn Rodrigues
2011-02-14 22:25         ` Joel Becker
2011-02-16  6:26           ` Goldwyn Rodrigues
2011-02-16 14:24             ` Goldwyn Rodrigues

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.