All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC PATCH: ext4 no journal corruption with locale-gen
@ 2009-06-17 18:48 Curt Wohlgemuth
  2009-06-17 23:46 ` Theodore Tso
  0 siblings, 1 reply; 17+ messages in thread
From: Curt Wohlgemuth @ 2009-06-17 18:48 UTC (permalink / raw)
  To: ext4 development

So I finally was able to figure out the data corruption problem with
locale-gen on ext4 without a journal.

Basically, using mmap(MAP_SHARED, PROT_WRITE) to write to a file through an
mmap'ed pointer is broken on ext4 when there is no journal.

It seems to be a combination of several problems:

   a. The choice of what address space ops to use in ext4_set_aops() just
      seems wrong to me.

   b. The use of ext4_journalled_writepage() if there is no journal being
      used is broken if the page was marked dirty from
      ext4_journalled_set_page_dirty().

      I don't understand how __ext4_journalled_writepage() would ever work.

ext4_set_aops() chooses among 4 different structures:

            ext4_da_aops
            ext4_ordered_aops
            ext4_writeback_aops
            ext4_journalled_aops

It seems to me that ext4_da_aops should be used whenever delayed allocation
is used, and the rest otherwise.  But this leaves open the question of what
to use when nodelalloc is used, AND there's no journal.  Today, this uses
ext4_journalled_aops, but this seems odd on its face.  Yes, I know that all
the routines there are supposed to handle no journal, but it's nevertheless
odd.

The problem with ext4_journalled_writepage() is this:

1. ext4_journalled_set_page_dirty() sets the PageChecked bit, then dirties
   the page.

2. ext4_journalled_writepage() will do the following; note the goto if there
   IS a journal:

===================================================================
	if (ext4_journal_current_handle())
		goto no_write;

	if (PageChecked(page)) {
		/*
		 * It's mmapped pagecache.  Add buffers and journal it.  There
		 * doesn't seem much point in redirtying the page here.
		 */
		ClearPageChecked(page);
		return __ext4_journalled_writepage(page, wbc);
	} else {
		/*
		 * It may be a page full of checkpoint-mode buffers.  We don't
		 * really know unless we go poke around in the buffer_heads.
		 * But block_write_full_page will do the right thing.
		 */
		return block_write_full_page(page,
						ext4_normal_get_block_write,
						wbc);
	}
no_write:
===================================================================

3. Unfortunately, __ext4_journalled_writepage() in the case of no journal,
   will just

      - call block_prepare_write()
      - calls write_end_fn() on all buffers, which just marks them dirty,
        doesn't actually write them out.

   And it doesn't seem to me that __ext4_journalled_writepage() will ever be
   called if there IS a journal.

   Am I missing something here?

4. If PageChecked bit isn't set, it calls block_write_full_page() and works
   fine.


Below is a patch that "fixes" ext4_set_aops(): that is, in the case of
delayed allocation but no journal, we'll use ext4_da_aops.  It does NOT fix
the problem of nodelalloc and either

   - no journal
   - data=journal

I'm holding off on fixing this because I'm not sure of the right place for
it.  I think it should be one of:

   a. Adding an address space ops structure just for nodelalloc/nojournal
   b. Getting rid of __ext4_journalled_writepage() as well as
      ext4_journalled_set_page_dirty(), since I'm not really sure they do
      anything.

Comments please?


	Signed-off-by: Curt Wohlgemuth <curtw@google.com>
---
--- linux-2.6/fs/ext4/inode.c.orig	2009-06-09 20:05:27.000000000 -0700
+++ linux-2.6/fs/ext4/inode.c	2009-06-17 11:07:57.000000000 -0700
@@ -3442,14 +3442,10 @@ static const struct address_space_operat

 void ext4_set_aops(struct inode *inode)
 {
-	if (ext4_should_order_data(inode) &&
-		test_opt(inode->i_sb, DELALLOC))
+	if (test_opt(inode->i_sb, DELALLOC))
 		inode->i_mapping->a_ops = &ext4_da_aops;
 	else if (ext4_should_order_data(inode))
 		inode->i_mapping->a_ops = &ext4_ordered_aops;
-	else if (ext4_should_writeback_data(inode) &&
-		 test_opt(inode->i_sb, DELALLOC))
-		inode->i_mapping->a_ops = &ext4_da_aops;
 	else if (ext4_should_writeback_data(inode))
 		inode->i_mapping->a_ops = &ext4_writeback_aops;
 	else

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-06-17 18:48 RFC PATCH: ext4 no journal corruption with locale-gen Curt Wohlgemuth
@ 2009-06-17 23:46 ` Theodore Tso
  2009-06-22 16:42   ` Curt Wohlgemuth
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Tso @ 2009-06-17 23:46 UTC (permalink / raw)
  To: Curt Wohlgemuth; +Cc: ext4 development

Hi Curt,

Thanks for your analysis of the bug.  The reason for the strange logic
in ext4_set_aops() is because at the moment the code doesn't support
the combination of data=journalled && delalloc.  That's why it was
explicitly checking for ext4_should_order_data() and
ext4_should_writeback_data().

We have a check for this in ext4_fill_super(), so your patch should be
safe, since the combination of ext4_should_journal_data &&
test_opt(inode->i_sb, DELALLOC) should never happen.

As to your question of whether the nodelalloc and nojournal case
should really be ext4_journalled_aops, I suspect ext4_writeback_aops
makes more sense.  I haven't audited all of the code paths to make
sure they DTRT in the non-journalled case yet, though.

							- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-06-17 23:46 ` Theodore Tso
@ 2009-06-22 16:42   ` Curt Wohlgemuth
  2009-06-22 16:56     ` Aneesh Kumar K.V
                       ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Curt Wohlgemuth @ 2009-06-22 16:42 UTC (permalink / raw)
  To: Theodore Tso; +Cc: ext4 development

Hi Ted:

I think the following patch is sufficient.  It explicitly sets the aops to
ext4_writeback_aops if there is no delayed allocation and no journal.

I tested the locale-gen example with all combinations of

   data=writeback
   data=ordered
   data=journal
   <no journal at all>

and

   delalloc
   nodelalloc

and it works correctly now.  The paths for writeback seem fine to me for an
inode w/o a journal.


       Signed-off-by: Curt Wohlgemuth <curtw@google.com>
---
--- 2.6.26/fs/ext4/inode.c.orig	2009-06-09 20:05:27.000000000 -0700
+++ 2.6.26/fs/ext4/inode.c	2009-06-22 08:55:13.000000000 -0700
@@ -3442,15 +3442,12 @@ static const struct address_space_operat

 void ext4_set_aops(struct inode *inode)
 {
-	if (ext4_should_order_data(inode) &&
-		test_opt(inode->i_sb, DELALLOC))
+	if (test_opt(inode->i_sb, DELALLOC))
 		inode->i_mapping->a_ops = &ext4_da_aops;
 	else if (ext4_should_order_data(inode))
 		inode->i_mapping->a_ops = &ext4_ordered_aops;
-	else if (ext4_should_writeback_data(inode) &&
-		 test_opt(inode->i_sb, DELALLOC))
-		inode->i_mapping->a_ops = &ext4_da_aops;
-	else if (ext4_should_writeback_data(inode))
+	else if (ext4_should_writeback_data(inode) ||
+	                         EXT4_JOURNAL(inode) == NULL)
 		inode->i_mapping->a_ops = &ext4_writeback_aops;
 	else
 		inode->i_mapping->a_ops = &ext4_journalled_aops;


On Wed, Jun 17, 2009 at 4:46 PM, Theodore Tso<tytso@mit.edu> wrote:
> Hi Curt,
>
> Thanks for your analysis of the bug.  The reason for the strange logic
> in ext4_set_aops() is because at the moment the code doesn't support
> the combination of data=journalled && delalloc.  That's why it was
> explicitly checking for ext4_should_order_data() and
> ext4_should_writeback_data().
>
> We have a check for this in ext4_fill_super(), so your patch should be
> safe, since the combination of ext4_should_journal_data &&
> test_opt(inode->i_sb, DELALLOC) should never happen.
>
> As to your question of whether the nodelalloc and nojournal case
> should really be ext4_journalled_aops, I suspect ext4_writeback_aops
> makes more sense.  I haven't audited all of the code paths to make
> sure they DTRT in the non-journalled case yet, though.
>
>                                                        - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-06-22 16:42   ` Curt Wohlgemuth
@ 2009-06-22 16:56     ` Aneesh Kumar K.V
  2009-06-22 17:01       ` Curt Wohlgemuth
  2009-06-22 21:13     ` Andreas Dilger
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2009-06-22 16:56 UTC (permalink / raw)
  To: Curt Wohlgemuth; +Cc: Theodore Tso, ext4 development

On Mon, Jun 22, 2009 at 09:42:25AM -0700, Curt Wohlgemuth wrote:
> Hi Ted:
> 
> I think the following patch is sufficient.  It explicitly sets the aops to
> ext4_writeback_aops if there is no delayed allocation and no journal.
> 
> I tested the locale-gen example with all combinations of
> 
>    data=writeback
>    data=ordered
>    data=journal
>    <no journal at all>
> 
> and
> 
>    delalloc
>    nodelalloc
> 
> and it works correctly now.  The paths for writeback seem fine to me for an
> inode w/o a journal.
> 
> 
>        Signed-off-by: Curt Wohlgemuth <curtw@google.com>
> ---
> --- 2.6.26/fs/ext4/inode.c.orig	2009-06-09 20:05:27.000000000 -0700
> +++ 2.6.26/fs/ext4/inode.c	2009-06-22 08:55:13.000000000 -0700
> @@ -3442,15 +3442,12 @@ static const struct address_space_operat
> 
>  void ext4_set_aops(struct inode *inode)
>  {
> -	if (ext4_should_order_data(inode) &&
> -		test_opt(inode->i_sb, DELALLOC))
> +	if (test_opt(inode->i_sb, DELALLOC))

This change is not related to the fix right ?

>  		inode->i_mapping->a_ops = &ext4_da_aops;
>  	else if (ext4_should_order_data(inode))
>  		inode->i_mapping->a_ops = &ext4_ordered_aops;
> -	else if (ext4_should_writeback_data(inode) &&
> -		 test_opt(inode->i_sb, DELALLOC))
> -		inode->i_mapping->a_ops = &ext4_da_aops;
> -	else if (ext4_should_writeback_data(inode))
> +	else if (ext4_should_writeback_data(inode) ||
> +	                         EXT4_JOURNAL(inode) == NULL)
>  		inode->i_mapping->a_ops = &ext4_writeback_aops;

Can you send a patch with this hunk alone. The previous one is not
related to the fix right ?


>  	else
>  		inode->i_mapping->a_ops = &ext4_journalled_aops;
> 
> 

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-06-22 16:56     ` Aneesh Kumar K.V
@ 2009-06-22 17:01       ` Curt Wohlgemuth
  0 siblings, 0 replies; 17+ messages in thread
From: Curt Wohlgemuth @ 2009-06-22 17:01 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Theodore Tso, ext4 development

On Mon, Jun 22, 2009 at 9:56 AM, Aneesh Kumar
K.V<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Mon, Jun 22, 2009 at 09:42:25AM -0700, Curt Wohlgemuth wrote:
>> Hi Ted:
>>
>> I think the following patch is sufficient.  It explicitly sets the aops to
>> ext4_writeback_aops if there is no delayed allocation and no journal.
>>
>> I tested the locale-gen example with all combinations of
>>
>>    data=writeback
>>    data=ordered
>>    data=journal
>>    <no journal at all>
>>
>> and
>>
>>    delalloc
>>    nodelalloc
>>
>> and it works correctly now.  The paths for writeback seem fine to me for an
>> inode w/o a journal.
>>
>>
>>        Signed-off-by: Curt Wohlgemuth <curtw@google.com>
>> ---
>> --- 2.6.26/fs/ext4/inode.c.orig       2009-06-09 20:05:27.000000000 -0700
>> +++ 2.6.26/fs/ext4/inode.c    2009-06-22 08:55:13.000000000 -0700
>> @@ -3442,15 +3442,12 @@ static const struct address_space_operat
>>
>>  void ext4_set_aops(struct inode *inode)
>>  {
>> -     if (ext4_should_order_data(inode) &&
>> -             test_opt(inode->i_sb, DELALLOC))
>> +     if (test_opt(inode->i_sb, DELALLOC))
>
> This change is not related to the fix right ?
>
>>               inode->i_mapping->a_ops = &ext4_da_aops;
>>       else if (ext4_should_order_data(inode))
>>               inode->i_mapping->a_ops = &ext4_ordered_aops;
>> -     else if (ext4_should_writeback_data(inode) &&
>> -              test_opt(inode->i_sb, DELALLOC))
>> -             inode->i_mapping->a_ops = &ext4_da_aops;
>> -     else if (ext4_should_writeback_data(inode))
>> +     else if (ext4_should_writeback_data(inode) ||
>> +                              EXT4_JOURNAL(inode) == NULL)
>>               inode->i_mapping->a_ops = &ext4_writeback_aops;
>
> Can you send a patch with this hunk alone. The previous one is not
> related to the fix right ?

You need them both.  We want all inodes with delayed allocation to use
ext4_da_aops.  The fall-through cases are just for nodelalloc.

Curt

>
>
>>       else
>>               inode->i_mapping->a_ops = &ext4_journalled_aops;
>>
>>
>
> -aneesh
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-06-22 16:42   ` Curt Wohlgemuth
  2009-06-22 16:56     ` Aneesh Kumar K.V
@ 2009-06-22 21:13     ` Andreas Dilger
  2009-07-01 18:35       ` Aneesh Kumar K.V
  2009-06-29 17:50     ` Curt Wohlgemuth
  2009-07-01 18:31     ` Aneesh Kumar K.V
  3 siblings, 1 reply; 17+ messages in thread
From: Andreas Dilger @ 2009-06-22 21:13 UTC (permalink / raw)
  To: Curt Wohlgemuth; +Cc: Theodore Tso, ext4 development

On Jun 22, 2009  09:42 -0700, Curt Wohlgemuth wrote:
> I tested the locale-gen example with all combinations of
> 
>    data=writeback
>    data=ordered
>    data=journal
>    <no journal at all>

On an unrelated note - would it be useful to mount an ext4 filesystem
with a journal using "data=none" (or similar) to run without a journal?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-06-22 16:42   ` Curt Wohlgemuth
  2009-06-22 16:56     ` Aneesh Kumar K.V
  2009-06-22 21:13     ` Andreas Dilger
@ 2009-06-29 17:50     ` Curt Wohlgemuth
  2009-07-01 18:31     ` Aneesh Kumar K.V
  3 siblings, 0 replies; 17+ messages in thread
From: Curt Wohlgemuth @ 2009-06-29 17:50 UTC (permalink / raw)
  To: Theodore Tso; +Cc: ext4 development

Ted, did you have any comment or objections to this patch?

Thanks,
Curt


On Mon, Jun 22, 2009 at 9:42 AM, Curt Wohlgemuth<curtw@google.com> wrote:
> Hi Ted:
>
> I think the following patch is sufficient.  It explicitly sets the aops to
> ext4_writeback_aops if there is no delayed allocation and no journal.
>
> I tested the locale-gen example with all combinations of
>
>   data=writeback
>   data=ordered
>   data=journal
>   <no journal at all>
>
> and
>
>   delalloc
>   nodelalloc
>
> and it works correctly now.  The paths for writeback seem fine to me for an
> inode w/o a journal.
>
>
>       Signed-off-by: Curt Wohlgemuth <curtw@google.com>
> ---
> --- 2.6.26/fs/ext4/inode.c.orig 2009-06-09 20:05:27.000000000 -0700
> +++ 2.6.26/fs/ext4/inode.c      2009-06-22 08:55:13.000000000 -0700
> @@ -3442,15 +3442,12 @@ static const struct address_space_operat
>
>  void ext4_set_aops(struct inode *inode)
>  {
> -       if (ext4_should_order_data(inode) &&
> -               test_opt(inode->i_sb, DELALLOC))
> +       if (test_opt(inode->i_sb, DELALLOC))
>                inode->i_mapping->a_ops = &ext4_da_aops;
>        else if (ext4_should_order_data(inode))
>                inode->i_mapping->a_ops = &ext4_ordered_aops;
> -       else if (ext4_should_writeback_data(inode) &&
> -                test_opt(inode->i_sb, DELALLOC))
> -               inode->i_mapping->a_ops = &ext4_da_aops;
> -       else if (ext4_should_writeback_data(inode))
> +       else if (ext4_should_writeback_data(inode) ||
> +                                EXT4_JOURNAL(inode) == NULL)
>                inode->i_mapping->a_ops = &ext4_writeback_aops;
>        else
>                inode->i_mapping->a_ops = &ext4_journalled_aops;
>
>
> On Wed, Jun 17, 2009 at 4:46 PM, Theodore Tso<tytso@mit.edu> wrote:
>> Hi Curt,
>>
>> Thanks for your analysis of the bug.  The reason for the strange logic
>> in ext4_set_aops() is because at the moment the code doesn't support
>> the combination of data=journalled && delalloc.  That's why it was
>> explicitly checking for ext4_should_order_data() and
>> ext4_should_writeback_data().
>>
>> We have a check for this in ext4_fill_super(), so your patch should be
>> safe, since the combination of ext4_should_journal_data &&
>> test_opt(inode->i_sb, DELALLOC) should never happen.
>>
>> As to your question of whether the nodelalloc and nojournal case
>> should really be ext4_journalled_aops, I suspect ext4_writeback_aops
>> makes more sense.  I haven't audited all of the code paths to make
>> sure they DTRT in the non-journalled case yet, though.
>>
>>                                                        - Ted
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-06-22 16:42   ` Curt Wohlgemuth
                       ` (2 preceding siblings ...)
  2009-06-29 17:50     ` Curt Wohlgemuth
@ 2009-07-01 18:31     ` Aneesh Kumar K.V
  2009-07-06  3:41       ` Theodore Tso
  3 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2009-07-01 18:31 UTC (permalink / raw)
  To: Curt Wohlgemuth; +Cc: Theodore Tso, ext4 development

On Mon, Jun 22, 2009 at 09:42:25AM -0700, Curt Wohlgemuth wrote:
> Hi Ted:
> 
> I think the following patch is sufficient.  It explicitly sets the aops to
> ext4_writeback_aops if there is no delayed allocation and no journal.
> 
> I tested the locale-gen example with all combinations of
> 
>    data=writeback
>    data=ordered
>    data=journal
>    <no journal at all>
> 
> and
> 
>    delalloc
>    nodelalloc
> 
> and it works correctly now.  The paths for writeback seem fine to me for an
> inode w/o a journal.
> 
> 
>        Signed-off-by: Curt Wohlgemuth <curtw@google.com>
> ---
> --- 2.6.26/fs/ext4/inode.c.orig	2009-06-09 20:05:27.000000000 -0700
> +++ 2.6.26/fs/ext4/inode.c	2009-06-22 08:55:13.000000000 -0700
> @@ -3442,15 +3442,12 @@ static const struct address_space_operat
> 
>  void ext4_set_aops(struct inode *inode)
>  {
> -	if (ext4_should_order_data(inode) &&
> -		test_opt(inode->i_sb, DELALLOC))
> +	if (test_opt(inode->i_sb, DELALLOC))
>  		inode->i_mapping->a_ops = &ext4_da_aops;
>  	else if (ext4_should_order_data(inode))
>  		inode->i_mapping->a_ops = &ext4_ordered_aops;
> -	else if (ext4_should_writeback_data(inode) &&
> -		 test_opt(inode->i_sb, DELALLOC))
> -		inode->i_mapping->a_ops = &ext4_da_aops;
> -	else if (ext4_should_writeback_data(inode))
> +	else if (ext4_should_writeback_data(inode) ||
> +	                         EXT4_JOURNAL(inode) == NULL)
>  		inode->i_mapping->a_ops = &ext4_writeback_aops;
>  	else
>  		inode->i_mapping->a_ops = &ext4_journalled_aops;
> 
> 

I looked at the patch in detail and  I guess we should instead force
a data=writeback mode if the filesystem is created without a journal.
I am not sure what whould be the meaning of data=ordered/data=journal
without a journal. So if we find that file system doesn't have a journal
then either we should update the default mount option in the filesystem
to be of data=writeback. Also if the user tried to mount with
data=ordered or data=journal we should print appropriate message and
force ourself to data=writeback.

Once we have data=writeback set then ext4_set_aops will handle the  case
properly.

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-06-22 21:13     ` Andreas Dilger
@ 2009-07-01 18:35       ` Aneesh Kumar K.V
  2009-07-01 18:44         ` Michael Rubin
  2009-07-02 23:27         ` Xiang Wang
  0 siblings, 2 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2009-07-01 18:35 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Curt Wohlgemuth, Theodore Tso, ext4 development

On Mon, Jun 22, 2009 at 11:13:27PM +0200, Andreas Dilger wrote:
> On Jun 22, 2009  09:42 -0700, Curt Wohlgemuth wrote:
> > I tested the locale-gen example with all combinations of
> > 
> >    data=writeback
> >    data=ordered
> >    data=journal
> >    <no journal at all>
> 
> On an unrelated note - would it be useful to mount an ext4 filesystem
> with a journal using "data=none" (or similar) to run without a journal?
> 

I think this is better. I would suggest data=nojournal. That way we can
check the mount options to figure out whether we are running with
journal or not. Also i guess this enables us to run without using a
journal even if  mke2fs created a journal for us

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-07-01 18:35       ` Aneesh Kumar K.V
@ 2009-07-01 18:44         ` Michael Rubin
  2009-07-02 23:27         ` Xiang Wang
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Rubin @ 2009-07-01 18:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andreas Dilger, Curt Wohlgemuth, Theodore Tso, ext4 development

On Wed, Jul 1, 2009 at 11:35 AM, Aneesh Kumar
K.V<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Mon, Jun 22, 2009 at 11:13:27PM +0200, Andreas Dilger wrote:
>> On Jun 22, 2009  09:42 -0700, Curt Wohlgemuth wrote:
> I think this is better. I would suggest data=nojournal. That way we can
> check the mount options to figure out whether we are running with
> journal or not. Also i guess this enables us to run without using a
> journal even if  mke2fs created a journal for us
>

As a heavy consumer of the non-journal mode I really like this idea.
It will make many things easier and clear for us.

mrubin
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-07-01 18:35       ` Aneesh Kumar K.V
  2009-07-01 18:44         ` Michael Rubin
@ 2009-07-02 23:27         ` Xiang Wang
  1 sibling, 0 replies; 17+ messages in thread
From: Xiang Wang @ 2009-07-02 23:27 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andreas Dilger, Curt Wohlgemuth, Theodore Tso, ext4 development

On Wed, Jul 1, 2009 at 11:35 AM, Aneesh Kumar
K.V<aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Mon, Jun 22, 2009 at 11:13:27PM +0200, Andreas Dilger wrote:
>> On Jun 22, 2009  09:42 -0700, Curt Wohlgemuth wrote:
>> > I tested the locale-gen example with all combinations of
>> >
>> >    data=writeback
>> >    data=ordered
>> >    data=journal
>> >    <no journal at all>
>>
>> On an unrelated note - would it be useful to mount an ext4 filesystem
>> with a journal using "data=none" (or similar) to run without a journal?
>>
>
> I think this is better. I would suggest data=nojournal. That way we can
> check the mount options to figure out whether we are running with
> journal or not. Also i guess this enables us to run without using a
> journal even if  mke2fs created a journal for us

We think adding this "data=nojournal" mount option is a very good idea.
To be more specific, after adding this option, the semantics will be:

If mke2fs does not create a journal, then we should use the
"data=nojournal" mount option only.

If mke2fs creates a journal for us, we are allowed to use one of the
following 4 mount options:
data=nojournal
data=writeback
data=ordered
data=journal

We are now working on this patch and will submit it in a couple of weeks.

Meanwhile if Ted could take the most recent patch from Curt to fix the data
corruption problem, that would be great!

Thanks,
Xiang

> -aneesh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-07-01 18:31     ` Aneesh Kumar K.V
@ 2009-07-06  3:41       ` Theodore Tso
  2009-07-06 15:30         ` Aneesh Kumar K.V
  2009-07-06 16:21         ` Xiang Wang
  0 siblings, 2 replies; 17+ messages in thread
From: Theodore Tso @ 2009-07-06  3:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Curt Wohlgemuth, ext4 development

On Thu, Jul 02, 2009 at 12:01:30AM +0530, Aneesh Kumar K.V wrote:
> 
> I looked at the patch in detail and  I guess we should instead force
> a data=writeback mode if the filesystem is created without a journal.
> I am not sure what whould be the meaning of data=ordered/data=journal
> without a journal. So if we find that file system doesn't have a journal
> then either we should update the default mount option in the filesystem
> to be of data=writeback. 

Here's a patch which takes your approach to solving the problem.  What
do you think? 

I haven't messed with dealing with the data= mount options in
fs/ext4/super.c.  That's important from a UI point of view, but we
needed to fix ext4_jbd2.h since it was unconditionally returning 0 if
there was no journal for all of the ext4_should_*_data() functions.

I believe this should DTRT with the -o nobh mount option, but I'd
appreciate another pair of eyes taking a look at this.

	   	   	   	       	 - Ted

commit 2a73eff8ba80095a871a6b402dfd24bc454e5bdc
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sun Jul 5 23:37:13 2009 -0400

    ext4: fix no journal corruption with locale-gen
    
    If there is no journal, ext4_should_writeback_data() should return
    TRUE.  This will fix ext4_set_aops() to set ext4_da_ops in the case of
    delayed allocation; otherwise ext4_journaled_aops gets used by
    default, which doesn't handle delayed allocation properly.
    
    The advantage of using ext4_should_writeback_data() approach is that
    it should handle nobh better as well.
    
    Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
    Kumar for suggesting this approach.
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index be2f426..f800134 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -282,7 +282,7 @@ static inline int ext4_should_order_data(struct inode *inode)
 static inline int ext4_should_writeback_data(struct inode *inode)
 {
 	if (EXT4_JOURNAL(inode) == NULL)
-		return 0;
+		return 1;
 	if (!S_ISREG(inode->i_mode))
 		return 0;
 	if (EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-07-06  3:41       ` Theodore Tso
@ 2009-07-06 15:30         ` Aneesh Kumar K.V
  2009-07-13 13:05           ` Theodore Tso
  2009-07-06 16:21         ` Xiang Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2009-07-06 15:30 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Curt Wohlgemuth, ext4 development

On Sun, Jul 05, 2009 at 11:41:44PM -0400, Theodore Tso wrote:
> On Thu, Jul 02, 2009 at 12:01:30AM +0530, Aneesh Kumar K.V wrote:
> > 
> > I looked at the patch in detail and  I guess we should instead force
> > a data=writeback mode if the filesystem is created without a journal.
> > I am not sure what whould be the meaning of data=ordered/data=journal
> > without a journal. So if we find that file system doesn't have a journal
> > then either we should update the default mount option in the filesystem
> > to be of data=writeback. 
> 
> Here's a patch which takes your approach to solving the problem.  What
> do you think? 
> 
> I haven't messed with dealing with the data= mount options in
> fs/ext4/super.c.  That's important from a UI point of view, but we
> needed to fix ext4_jbd2.h since it was unconditionally returning 0 if
> there was no journal for all of the ext4_should_*_data() functions.
> 
> I believe this should DTRT with the -o nobh mount option, but I'd
> appreciate another pair of eyes taking a look at this.
> 
> 	   	   	   	       	 - Ted
> 
> commit 2a73eff8ba80095a871a6b402dfd24bc454e5bdc
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Sun Jul 5 23:37:13 2009 -0400
> 
>     ext4: fix no journal corruption with locale-gen
>     
>     If there is no journal, ext4_should_writeback_data() should return
>     TRUE.  This will fix ext4_set_aops() to set ext4_da_ops in the case of
>     delayed allocation; otherwise ext4_journaled_aops gets used by
>     default, which doesn't handle delayed allocation properly.
>     
>     The advantage of using ext4_should_writeback_data() approach is that
>     it should handle nobh better as well.
>     
>     Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
>     Kumar for suggesting this approach.
>     
>     Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> 
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index be2f426..f800134 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -282,7 +282,7 @@ static inline int ext4_should_order_data(struct inode *inode)
>  static inline int ext4_should_writeback_data(struct inode *inode)
>  {
>  	if (EXT4_JOURNAL(inode) == NULL)
> -		return 0;
> +		return 1;
>  	if (!S_ISREG(inode->i_mode))
>  		return 0;
>  	if (EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL)

We may want to change it after	if (!S_ISREG(inode->i_mode))
So that we don't return 1 for other than regular files.

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-07-06  3:41       ` Theodore Tso
  2009-07-06 15:30         ` Aneesh Kumar K.V
@ 2009-07-06 16:21         ` Xiang Wang
  2009-07-09 18:30           ` Xiang Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Xiang Wang @ 2009-07-06 16:21 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Aneesh Kumar K.V, Curt Wohlgemuth, ext4 development

On Sun, Jul 5, 2009 at 8:41 PM, Theodore Tso<tytso@mit.edu> wrote:
> On Thu, Jul 02, 2009 at 12:01:30AM +0530, Aneesh Kumar K.V wrote:
>>
>> I looked at the patch in detail and  I guess we should instead force
>> a data=writeback mode if the filesystem is created without a journal.
>> I am not sure what whould be the meaning of data=ordered/data=journal
>> without a journal. So if we find that file system doesn't have a journal
>> then either we should update the default mount option in the filesystem
>> to be of data=writeback.
>
> Here's a patch which takes your approach to solving the problem.  What
> do you think?
>
> I haven't messed with dealing with the data= mount options in
> fs/ext4/super.c.  That's important from a UI point of view, but we
> needed to fix ext4_jbd2.h since it was unconditionally returning 0 if
> there was no journal for all of the ext4_should_*_data() functions.
>
> I believe this should DTRT with the -o nobh mount option, but I'd
> appreciate another pair of eyes taking a look at this.

This patch looks good to us.

In the long run, we still think adding the data=nojournal mount option
is useful and we are working on this patch.

Thanks,
Xiang

>
>                                         - Ted
>
> commit 2a73eff8ba80095a871a6b402dfd24bc454e5bdc
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Sun Jul 5 23:37:13 2009 -0400
>
>    ext4: fix no journal corruption with locale-gen
>
>    If there is no journal, ext4_should_writeback_data() should return
>    TRUE.  This will fix ext4_set_aops() to set ext4_da_ops in the case of
>    delayed allocation; otherwise ext4_journaled_aops gets used by
>    default, which doesn't handle delayed allocation properly.
>
>    The advantage of using ext4_should_writeback_data() approach is that
>    it should handle nobh better as well.
>
>    Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
>    Kumar for suggesting this approach.
>
>    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
>
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index be2f426..f800134 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -282,7 +282,7 @@ static inline int ext4_should_order_data(struct inode *inode)
>  static inline int ext4_should_writeback_data(struct inode *inode)
>  {
>        if (EXT4_JOURNAL(inode) == NULL)
> -               return 0;
> +               return 1;
>        if (!S_ISREG(inode->i_mode))
>                return 0;
>        if (EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-07-06 16:21         ` Xiang Wang
@ 2009-07-09 18:30           ` Xiang Wang
  2009-07-10  7:28             ` Aneesh Kumar K.V
  0 siblings, 1 reply; 17+ messages in thread
From: Xiang Wang @ 2009-07-09 18:30 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Aneesh Kumar K.V, Curt Wohlgemuth, ext4 development

When working on the patch of adding the data=nojournal mount option,
I start to wonder whether this mount option is actually needed.

When we mount a filesystem that was mkfs'ed with journal, using the
"noload" mount option
can help specify we do not load the journal.

When we mount a filesystem that was mkfs'ed without journal, we simply
go into the
nojournal mode.

That said, I do not really feel this data=nojournal option is necessary.
But I am still working on the patch to print appropriate messages when
people mount a filesystem
created without a journal but explicitly specify the "data=" option.
Any comments?

Thanks,
Xiang

On Mon, Jul 6, 2009 at 9:21 AM, Xiang Wang<xiangw@google.com> wrote:
> On Sun, Jul 5, 2009 at 8:41 PM, Theodore Tso<tytso@mit.edu> wrote:
>> On Thu, Jul 02, 2009 at 12:01:30AM +0530, Aneesh Kumar K.V wrote:
>>>
>>> I looked at the patch in detail and  I guess we should instead force
>>> a data=writeback mode if the filesystem is created without a journal.
>>> I am not sure what whould be the meaning of data=ordered/data=journal
>>> without a journal. So if we find that file system doesn't have a journal
>>> then either we should update the default mount option in the filesystem
>>> to be of data=writeback.
>>
>> Here's a patch which takes your approach to solving the problem.  What
>> do you think?
>>
>> I haven't messed with dealing with the data= mount options in
>> fs/ext4/super.c.  That's important from a UI point of view, but we
>> needed to fix ext4_jbd2.h since it was unconditionally returning 0 if
>> there was no journal for all of the ext4_should_*_data() functions.
>>
>> I believe this should DTRT with the -o nobh mount option, but I'd
>> appreciate another pair of eyes taking a look at this.
>
> This patch looks good to us.
>
> In the long run, we still think adding the data=nojournal mount option
> is useful and we are working on this patch.
>
> Thanks,
> Xiang
>
>>
>>                                         - Ted
>>
>> commit 2a73eff8ba80095a871a6b402dfd24bc454e5bdc
>> Author: Theodore Ts'o <tytso@mit.edu>
>> Date:   Sun Jul 5 23:37:13 2009 -0400
>>
>>    ext4: fix no journal corruption with locale-gen
>>
>>    If there is no journal, ext4_should_writeback_data() should return
>>    TRUE.  This will fix ext4_set_aops() to set ext4_da_ops in the case of
>>    delayed allocation; otherwise ext4_journaled_aops gets used by
>>    default, which doesn't handle delayed allocation properly.
>>
>>    The advantage of using ext4_should_writeback_data() approach is that
>>    it should handle nobh better as well.
>>
>>    Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
>>    Kumar for suggesting this approach.
>>
>>    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
>>
>> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
>> index be2f426..f800134 100644
>> --- a/fs/ext4/ext4_jbd2.h
>> +++ b/fs/ext4/ext4_jbd2.h
>> @@ -282,7 +282,7 @@ static inline int ext4_should_order_data(struct inode *inode)
>>  static inline int ext4_should_writeback_data(struct inode *inode)
>>  {
>>        if (EXT4_JOURNAL(inode) == NULL)
>> -               return 0;
>> +               return 1;
>>        if (!S_ISREG(inode->i_mode))
>>                return 0;
>>        if (EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL)
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-07-09 18:30           ` Xiang Wang
@ 2009-07-10  7:28             ` Aneesh Kumar K.V
  0 siblings, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2009-07-10  7:28 UTC (permalink / raw)
  To: Xiang Wang; +Cc: Theodore Tso, Curt Wohlgemuth, ext4 development

On Thu, Jul 09, 2009 at 11:30:58AM -0700, Xiang Wang wrote:
> When working on the patch of adding the data=nojournal mount option,
> I start to wonder whether this mount option is actually needed.
> 
> When we mount a filesystem that was mkfs'ed with journal, using the
> "noload" mount option
> can help specify we do not load the journal.
> 
> When we mount a filesystem that was mkfs'ed without journal, we simply
> go into the
> nojournal mode.
> 
> That said, I do not really feel this data=nojournal option is necessary.
> But I am still working on the patch to print appropriate messages when
> people mount a filesystem
> created without a journal but explicitly specify the "data=" option.
> Any comments?
> 

Now that I look at commit 0390131ba84fd3f726f9e24fc4553828125700bb
if i understood correctly, that should always force a no journal 
mount to data=writeback ?

I tested it as below

 mkfs.ext4 -O ^has_journal /home/opensource/images/ext3.img
 sudo mount -o loop /home/opensource/images/ext3.img  /mnt/
[master@linux-2.6]$ cat /proc/mounts  | grep mnt
/dev/loop0 /mnt ext4 rw,relatime,barrier=1,data=writeback 0 0

So what we really need is to update ext4_should_writeback_data
not to return 0 when we  don't have a journal. Ted already have a patch
in the patch queue that does the same. I guess that patch should get
everything working fine.  So if mke2fs have created a journal to run
in no journal mode one can say -o noload. I guess that is more or less
what we wanted.

-aneesh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: RFC PATCH: ext4 no journal corruption with locale-gen
  2009-07-06 15:30         ` Aneesh Kumar K.V
@ 2009-07-13 13:05           ` Theodore Tso
  0 siblings, 0 replies; 17+ messages in thread
From: Theodore Tso @ 2009-07-13 13:05 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Curt Wohlgemuth, ext4 development

On Mon, Jul 06, 2009 at 09:00:16PM +0530, Aneesh Kumar K.V wrote:
> > diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> > index be2f426..f800134 100644
> > --- a/fs/ext4/ext4_jbd2.h
> > +++ b/fs/ext4/ext4_jbd2.h
> > @@ -282,7 +282,7 @@ static inline int ext4_should_order_data(struct inode *inode)
> >  static inline int ext4_should_writeback_data(struct inode *inode)
> >  {
> >  	if (EXT4_JOURNAL(inode) == NULL)
> > -		return 0;
> > +		return 1;
> >  	if (!S_ISREG(inode->i_mode))
> >  		return 0;
> >  	if (EXT4_I(inode)->i_flags & EXT4_JOURNAL_DATA_FL)
> 
> We may want to change it after	if (!S_ISREG(inode->i_mode))
> So that we don't return 1 for other than regular files.

Thanks for the suggestion; I've made this change.

						- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-07-13 13:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-17 18:48 RFC PATCH: ext4 no journal corruption with locale-gen Curt Wohlgemuth
2009-06-17 23:46 ` Theodore Tso
2009-06-22 16:42   ` Curt Wohlgemuth
2009-06-22 16:56     ` Aneesh Kumar K.V
2009-06-22 17:01       ` Curt Wohlgemuth
2009-06-22 21:13     ` Andreas Dilger
2009-07-01 18:35       ` Aneesh Kumar K.V
2009-07-01 18:44         ` Michael Rubin
2009-07-02 23:27         ` Xiang Wang
2009-06-29 17:50     ` Curt Wohlgemuth
2009-07-01 18:31     ` Aneesh Kumar K.V
2009-07-06  3:41       ` Theodore Tso
2009-07-06 15:30         ` Aneesh Kumar K.V
2009-07-13 13:05           ` Theodore Tso
2009-07-06 16:21         ` Xiang Wang
2009-07-09 18:30           ` Xiang Wang
2009-07-10  7:28             ` Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.