All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: "Theodore Ts'o" <tytso@mit.edu>, Nix <nix@esperi.org.uk>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	gregkh@linuxfoundation.org
Subject: Re: [PATCH -v3] ext4: fix unjournaled inode bitmap modification
Date: Sun, 28 Oct 2012 22:24:47 -0500	[thread overview]
Message-ID: <508DF6FF.9090209@redhat.com> (raw)
In-Reply-To: <20121029023034.GA9365@thunk.org>

On 10/28/12 9:30 PM, Theodore Ts'o wrote:
> On Sat, Oct 27, 2012 at 11:23:57PM -0500, Eric Sandeen wrote:
>> A little more going on here to try to properly handle error
>> cases & moving to the next group; despite
>> ext4_handle_release_buffer being a no-op, I've tried
>> to sprinkle it in at the right places.  Double checking
>> on review is probably a fine idea ;)
> 
> Sorry, I didn't see your newer version of your patch.  I'm not
> convinced it's worth it to try to get the calls to
> ext4_handle_release_buffer() right.  There are plenty of other places
> where we're not calling ext4_handle_release_buffer(), and I'm not
> convinced it would ever be useful to make it be something other than a
> no-op.  

Fair enough, I went a little overboard.

> In order to make it be useful, we'd have to enforce a rule
> that every single get_write_access() was matched with either a
> handle_dirty_metadata() or a handle_release_buffer().  That would be
> tricky; worse, we'd have to keep track of a refcount on each bh, which
> would cost us on the scalability front.  The main benefit would be
> that might be able to be able to reclaim bh's where we called
> get_write_access() and then changed our mind, but that's relatively
> rare, and I think it's easier to simply be more careful about calling
> get_write_acceess() until we're sure we're going to need write access.
> 
> Hence in my version of the patch, I've waited until right before the
> call to ext4_lock_group() before calling get_write_access().  Note
> that it's safe to call get_write_access() on a bh twice; the second
> time the jbd2 layer will notice that the bh is already a part of the
> transaction.

Yeah, I guess that's the norm.

So on the one hand you delay calling it until we're sure we need
it; OTOH it's no big deal if it does get called twice :)

> Also, leaving out the calls to ext4_handle_release_buffer() makes the
> patch easier to understand and reason about.

Fair enough.

> What do you think of this version?

Looks fine, tests fine.  Ship it ;)

-Eric

> 						- Ted
> 
> commit 67d725143e9e7ea458a0c1c4a6625657c3dc7ba2
> Author: Eric Sandeen <sandeen@redhat.com>
> Date:   Sun Oct 28 22:24:57 2012 -0400
> 
>     ext4: fix unjournaled inode bitmap modification
>     
>     commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed
>     ext4_new_inode() such that the inode bitmap was being modified
>     outside a transaction, which could lead to corruption, and was
>     discovered when journal_checksum found a bad checksum in the
>     journal during log replay.
>     
>     Nix ran into this when using the journal_async_commit mount
>     option, which enables journal checksumming.  The ensuing
>     journal replay failures due to the bad checksums led to
>     filesystem corruption reported as the now infamous
>     "Apparent serious progressive ext4 data corruption bug"
>     
>     [ Changed by tytso to only call ext4_journal_get_write_access() only
>       when we're fairly certain that we're going to allocate the inode. ]
>     
>     I've tested this by mounting with journal_checksum and
>     running fsstress then dropping power; I've also tested by
>     hacking DM to create snapshots w/o first quiescing, which
>     allows me to test journal replay repeatedly w/o actually
>     power-cycling the box.  Without the patch I hit a journal
>     checksum error every time.  With this fix it survives
>     many iterations.
>     
>     Reported-by: Nix <nix@esperi.org.uk>
>     Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>     Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
>     Cc: stable@vger.kernel.org
> 
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 4facdd2..3a100e7 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -725,6 +725,10 @@ repeat_in_this_group:
>  				   "inode=%lu", ino + 1);
>  			continue;
>  		}
> +		BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
> +		err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
> +		if (err)
> +			goto fail;
>  		ext4_lock_group(sb, group);
>  		ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
>  		ext4_unlock_group(sb, group);
> @@ -738,6 +742,11 @@ repeat_in_this_group:
>  	goto out;
>  
>  got:
> +	BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
> +	err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
> +	if (err)
> +		goto fail;
> +
>  	/* We may have to initialize the block bitmap if it isn't already */
>  	if (ext4_has_group_desc_csum(sb) &&
>  	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
> @@ -771,11 +780,6 @@ got:
>  			goto fail;
>  	}
>  
> -	BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
> -	err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
> -	if (err)
> -		goto fail;
> -
>  	BUFFER_TRACE(group_desc_bh, "get_write_access");
>  	err = ext4_journal_get_write_access(handle, group_desc_bh);
>  	if (err)
> @@ -823,11 +827,6 @@ got:
>  	}
>  	ext4_unlock_group(sb, group);
>  
> -	BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
> -	err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
> -	if (err)
> -		goto fail;
> -
>  	BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
>  	err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);
>  	if (err)
> 
> 


  reply	other threads:[~2012-10-29  3:24 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-22 16:17 Heads-up: 3.6.2 / 3.6.3 NFS server panic: 3.6.2+ regression? Nix
2012-10-23  1:33 ` J. Bruce Fields
2012-10-23 14:07   ` Nix
2012-10-23 14:30     ` J. Bruce Fields
2012-10-23 16:32       ` Heads-up: 3.6.2 / 3.6.3 NFS server oops: 3.6.2+ regression? (also an unrelated ext4 data loss bug) Nix
2012-10-23 16:46         ` J. Bruce Fields
2012-10-23 16:54           ` J. Bruce Fields
2012-10-23 16:56           ` Myklebust, Trond
2012-10-23 16:56             ` Myklebust, Trond
2012-10-23 17:05             ` Nix
2012-10-23 17:36               ` Nix
2012-10-23 17:43                 ` J. Bruce Fields
2012-10-23 17:44                 ` Myklebust, Trond
2012-10-23 17:57                   ` Myklebust, Trond
2012-10-23 17:57                     ` Myklebust, Trond
     [not found]                   ` <1351015039.4622.23.camel@lade.trondhjem.org>
2012-10-23 18:23                     ` Myklebust, Trond
2012-10-23 18:23                       ` Myklebust, Trond
2012-10-23 19:49                       ` Nix
2012-10-24 10:18                         ` [PATCH] lockd: fix races in per-net NSM client handling Stanislav Kinsbursky
2012-10-23 20:57         ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Nix
2012-10-23 20:57           ` Nix
2012-10-23 22:19           ` Theodore Ts'o
2012-10-23 22:47             ` Nix
2012-10-23 23:16               ` Theodore Ts'o
2012-10-23 23:06             ` Nix
2012-10-23 23:28               ` Theodore Ts'o
2012-10-23 23:34                 ` Nix
2012-10-24  0:57             ` Eric Sandeen
2012-10-24 20:17               ` Jan Kara
2012-10-26 15:25                 ` Eric Sandeen
2012-10-24 19:13             ` Jannis Achstetter
2012-10-24 19:13               ` Jannis Achstetter
2012-10-24 21:31               ` Theodore Ts'o
2012-10-24 22:05                 ` Jannis Achstetter
2012-10-24 23:47                 ` Nix
2012-10-25 17:02                 ` Felipe Contreras
2012-10-24 21:04             ` Jannis Achstetter
2012-10-24  1:13           ` Eric Sandeen
2012-10-24  1:13             ` Eric Sandeen
2012-10-24  4:15             ` Nix
2012-10-24  4:27               ` Eric Sandeen
2012-10-24  5:23                 ` Theodore Ts'o
2012-10-24  7:00                   ` Hugh Dickins
2012-10-24 11:46                     ` Nix
2012-10-24 11:45                   ` Nix
2012-10-24 17:22                   ` Eric Sandeen
2012-10-24 19:49                   ` Nix
2012-10-24 19:54                     ` Nix
2012-10-24 20:30                     ` Eric Sandeen
2012-10-24 20:34                       ` Nix
2012-10-24 20:45                     ` Nix
2012-10-24 21:08                     ` Theodore Ts'o
2012-10-24 23:27                       ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Nix
2012-10-24 23:42                         ` Nix
2012-10-25  1:10                         ` Theodore Ts'o
2012-10-25  1:45                           ` Nix
2012-10-25  1:45                             ` Nix
2012-10-25 14:12                             ` Theodore Ts'o
2012-10-25 14:15                               ` Nix
2012-10-25 17:39                                 ` Nix
2012-10-25 11:06                           ` Nix
2012-10-26  0:22                           ` Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) (possibly blockdev / arcmsr at fault??) Nix
2012-10-26  0:11               ` Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?) Ric Wheeler
2012-10-26  0:43                 ` Theodore Ts'o
2012-10-26 12:12                   ` Nix
2012-10-26 20:35           ` Eric Sandeen
2012-10-26 20:37             ` Nix
2012-10-26 20:56               ` Theodore Ts'o
2012-10-26 20:56                 ` Theodore Ts'o
2012-10-26 20:59                 ` Nix
2012-10-26 20:59                   ` Nix
2012-10-26 21:15                   ` Theodore Ts'o
2012-10-26 21:15                     ` Theodore Ts'o
2012-10-26 21:19                     ` Nix
2012-10-27  0:22                       ` Theodore Ts'o
2012-10-27  0:22                         ` Theodore Ts'o
2012-10-27 12:45                         ` Nix
2012-10-27 17:55                           ` Theodore Ts'o
2012-10-27 18:47                             ` Nix
2012-10-27 21:19                               ` Eric Sandeen
2012-10-27 21:21                                 ` Nix
2012-10-27 21:23                                   ` Eric Sandeen
2012-10-27 21:29                                     ` Nix
2012-10-27 21:34                                       ` Eric Sandeen
2012-10-27 21:40                                         ` Nix
     [not found]                                         ` <09758CEA-74B5-48D0-8075-BB723A2CABBB@dilger.ca>
2012-10-29  2:09                                           ` Eric Sandeen
2012-10-27 22:42                                 ` Eric Sandeen
2012-10-29  1:00                                   ` Theodore Ts'o
2012-10-29  1:04                                     ` Nix
2012-10-29  2:24                                     ` Eric Sandeen
2012-10-29  2:34                                       ` Theodore Ts'o
2012-10-29  2:35                                         ` Eric Sandeen
2012-10-29  2:42                                           ` Theodore Ts'o
2012-10-27 18:30                           ` Eric Sandeen
2012-10-27  3:11                     ` Jim Rees
2012-10-27  3:11                       ` Jim Rees
2012-10-27  8:01             ` Testing ext4's journal via simulating a reboot via KVM Theodore Ts'o
2012-10-28  4:23           ` [PATCH] ext4: fix unjournaled inode bitmap modification Eric Sandeen
2012-10-28  4:23             ` Eric Sandeen
2012-10-28 13:59             ` Nix
2012-10-29  2:30             ` [PATCH -v3] " Theodore Ts'o
2012-10-29  2:30               ` Theodore Ts'o
2012-10-29  3:24               ` Eric Sandeen [this message]
2012-10-29  5:07               ` Andreas Dilger
2012-10-29 17:08               ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=508DF6FF.9090209@redhat.com \
    --to=sandeen@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nix@esperi.org.uk \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.