[RFC] jbd2: reduce the number of writes when commiting a transacation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] jbd2: reduce the number of writes when commiting a transacation
@ 2012-04-20 11:06 Zheng Liu
  2012-04-20 11:21 ` Andreas Dilger
  0 siblings, 1 reply; 10+ messages in thread
From: Zheng Liu @ 2012-04-20 11:06 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel

Hi list,

In this thread[1], I found a defect in jbd2 because it needs two wrties
to finish a transacation because it writes journal header and data to
disk and it will write commit to disk after above writes are done.
AFAIK, in jbd2, it will call submit_bh twice at least to write the data
because journal header, data and commit are stored in different
buffer_heads.  If we don't call them separately, these calls might be
out of order.  Obviously, it must ensure that journal header and data are
written before commit.  But this brings a huge overhead in this
benchmark[2].  So, IMHO, if we could use *bio* to store these data
rather than buffer_head, we could avoid this overhead because we can
call submit_bio only once to write all of data, which contains journal
header, data and commit.  Here is an issue that I don't determine.  If
we use submit_bio to write journal data, it will make all of data with
WRITE_FLUSH_FUA flag.  But now there is only commit data with this flag.
I am not sure whether or not it brings some other unpridictable
problems. :(

Please feel free to comment this RFC.  Thank you.

1. http://www.spinics.net/lists/linux-ext4/msg31637.html
2. benchmark: time for((i=0;i<2000;i++)); do \
		dd if=/dev/zero of=/mnt/sda1/testfile conv=notrunc bs=4k \
		count=1 seek=`expr $i \* 16` oflag=sync,direct 2>/dev/null; \
		done

Regards,
Zheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jbd2: reduce the number of writes when commiting a transacation
  2012-04-20 11:06 [RFC] jbd2: reduce the number of writes when commiting a transacation Zheng Liu
@ 2012-04-20 11:21 ` Andreas Dilger
  2012-04-23  2:25   ` Zheng Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Dilger @ 2012-04-20 11:21 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-ext4, linux-fsdevel

On 2012-04-20, at 5:06 AM, Zheng Liu wrote:
> In this thread[1], I found a defect in jbd2 because it needs two wrties
> to finish a transacation because it writes journal header and data to
> disk and it will write commit to disk after above writes are done.
> AFAIK, in jbd2, it will call submit_bh twice at least to write the data
> because journal header, data and commit are stored in different
> buffer_heads.  If we don't call them separately, these calls might be
> out of order.  Obviously, it must ensure that journal header and data are written before commit.  But this brings a huge overhead in this
> benchmark[2].  So, IMHO, if we could use *bio* to store these data
> rather than buffer_head, we could avoid this overhead because we can
> call submit_bio only once to write all of data, which contains journal
> header, data and commit.  Here is an issue that I don't determine.  If
> we use submit_bio to write journal data, it will make all of data with
> WRITE_FLUSH_FUA flag.  But now there is only commit data with this flag.

The reason that there are two separate writes is because if the write
of the commit block is reordered before the journal data, and only the
commit block is written before a crash (data is lost), then the journal
replay code may incorrectly think that the transaction is complete and
copy the unwritten (garbage) block to the wrong place.

I think there is potentially an existing solution to this problem,
which is the async journal commit feature.  It adds checksums to the
journal commit block, which allows verifying that all blocks were
written to disk properly even if the commit block is submitted at
the same time as the journal data blocks.

One problem with this implementation is that if an intermediate
journal commit has a data corruption (i.e. checksum of all data
blocks does not match the commit block), then it is not possible
to know which block(s) contain bad data.  After that, potentially
many thousands of other operations may be lost.

We discussed a scheme to store a separate checksum for each block
in a transaction, by storing a 16-bit checksum (likely the low
16 bits of CRC32c) into the high flags word for each block.  Then,
if one or more blocks is corrupted, it is possible to skip replay
of just those blocks, and potentially they will even be overwritten
by blocks in a later transaction, requiring no e2fsck at all.

> I am not sure whether or not it brings some other unpridictable
> problems. :(
> 
> Please feel free to comment this RFC.  Thank you.
> 
> 1. http://www.spinics.net/lists/linux-ext4/msg31637.html
> 2. benchmark: time for((i=0;i<2000;i++)); do \
> 		dd if=/dev/zero of=/mnt/sda1/testfile conv=notrunc bs=4k \
> 		count=1 seek=`expr $i \* 16` oflag=sync,direct 2>/dev/null; \
> 		done
> 
> Regards,
> Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cheers, Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jbd2: reduce the number of writes when commiting a transacation
  2012-04-20 11:21 ` Andreas Dilger
@ 2012-04-23  2:25   ` Zheng Liu
  2012-04-23  6:24     ` Andreas Dilger
  0 siblings, 1 reply; 10+ messages in thread
From: Zheng Liu @ 2012-04-23  2:25 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-ext4, linux-fsdevel

On Fri, Apr 20, 2012 at 05:21:59AM -0600, Andreas Dilger wrote:
> On 2012-04-20, at 5:06 AM, Zheng Liu wrote:
> > In this thread[1], I found a defect in jbd2 because it needs two wrties
> > to finish a transacation because it writes journal header and data to
> > disk and it will write commit to disk after above writes are done.
> > AFAIK, in jbd2, it will call submit_bh twice at least to write the data
> > because journal header, data and commit are stored in different
> > buffer_heads.  If we don't call them separately, these calls might be
> > out of order.  Obviously, it must ensure that journal header and data are written before commit.  But this brings a huge overhead in this
> > benchmark[2].  So, IMHO, if we could use *bio* to store these data
> > rather than buffer_head, we could avoid this overhead because we can
> > call submit_bio only once to write all of data, which contains journal
> > header, data and commit.  Here is an issue that I don't determine.  If
> > we use submit_bio to write journal data, it will make all of data with
> > WRITE_FLUSH_FUA flag.  But now there is only commit data with this flag.
> 
> The reason that there are two separate writes is because if the write
> of the commit block is reordered before the journal data, and only the
> commit block is written before a crash (data is lost), then the journal
> replay code may incorrectly think that the transaction is complete and
> copy the unwritten (garbage) block to the wrong place.
> 
> I think there is potentially an existing solution to this problem,
> which is the async journal commit feature.  It adds checksums to the
> journal commit block, which allows verifying that all blocks were
> written to disk properly even if the commit block is submitted at
> the same time as the journal data blocks.
> 
> One problem with this implementation is that if an intermediate
> journal commit has a data corruption (i.e. checksum of all data
> blocks does not match the commit block), then it is not possible
> to know which block(s) contain bad data.  After that, potentially
> many thousands of other operations may be lost.
> 
> We discussed a scheme to store a separate checksum for each block
> in a transaction, by storing a 16-bit checksum (likely the low
> 16 bits of CRC32c) into the high flags word for each block.  Then,
> if one or more blocks is corrupted, it is possible to skip replay
> of just those blocks, and potentially they will even be overwritten
> by blocks in a later transaction, requiring no e2fsck at all.

Thanks for pointing out this feature.  I have evaluated this feature in my
benchmark, and it can dramatically improve the performance. :-)

BTW, out of curiosity, why not set this feature on default?

Regards,
Zheng

> 
> > I am not sure whether or not it brings some other unpridictable
> > problems. :(
> > 
> > Please feel free to comment this RFC.  Thank you.
> > 
> > 1. http://www.spinics.net/lists/linux-ext4/msg31637.html
> > 2. benchmark: time for((i=0;i<2000;i++)); do \
> > 		dd if=/dev/zero of=/mnt/sda1/testfile conv=notrunc bs=4k \
> > 		count=1 seek=`expr $i \* 16` oflag=sync,direct 2>/dev/null; \
> > 		done
> > 
> > Regards,
> > Zheng
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jbd2: reduce the number of writes when commiting a transacation
  2012-04-23  2:25   ` Zheng Liu
@ 2012-04-23  6:24     ` Andreas Dilger
  2012-04-23  7:23       ` Zheng Liu
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andreas Dilger @ 2012-04-23  6:24 UTC (permalink / raw)
  To: Zheng Liu; +Cc: Andreas Dilger, linux-ext4, linux-fsdevel

On 2012-04-22, at 21:25, Zheng Liu <gnehzuil.liu@gmail.com> wrote:

> On Fri, Apr 20, 2012 at 05:21:59AM -0600, Andreas Dilger wrote:
>> 
>> 
>> The reason that there are two separate writes is because if the write
>> of the commit block is reordered before the journal data, and only the
>> commit block is written before a crash (data is lost), then the journal
>> replay code may incorrectly think that the transaction is complete and
>> copy the unwritten (garbage) block to the wrong place.
>> 
>> I think there is potentially an existing solution to this problem,
>> which is the async journal commit feature.  It adds checksums to the
>> journal commit block, which allows verifying that all blocks were
>> written to disk properly even if the commit block is submitted at
>> the same time as the journal data blocks.
>> 
>> One problem with this implementation is that if an intermediate
>> journal commit has a data corruption (i.e. checksum of all data
>> blocks does not match the commit block), then it is not possible
>> to know which block(s) contain bad data.  After that, potentially
>> many thousands of other operations may be lost.
>> 
>> We discussed a scheme to store a separate checksum for each block
>> in a transaction, by storing a 16-bit checksum (likely the low
>> 16 bits of CRC32c) into the high flags word for each block.  Then,
>> if one or more blocks is corrupted, it is possible to skip replay
>> of just those blocks, and potentially they will even be overwritten
>> by blocks in a later transaction, requiring no e2fsck at all.
> 
> Thanks for pointing out this feature.  I have evaluated this feature in my
> benchmark, and it can dramatically improve the performance. :-)
> 
> BTW, out of curiosity, why not set this feature on default?

As mentioned previously, one drawback of depending on the checksums for transaction commit is that if one block in and of the older transactions is bad, then this will cause the bad block's transaction to be aborted, along with all of the later transactions.

By skipping the replay of many transactions after reboot (some of which may have already written to the filesystem before the crash) this may leave the filesystem in a very inconsistent state. 

A better solution. (which has been discussed, but not implemented yet) is to write the checksum for each block in the transaction, and only skip restoring the block(s) with a good checksum in an otherwise complete transaction. 

This would need to change the journal disk format, but might be a good time to do this with Darrick's other checksum patches. 

Cheers, Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jbd2: reduce the number of writes when commiting a transacation
  2012-04-23  6:24     ` Andreas Dilger
@ 2012-04-23  7:23       ` Zheng Liu
  2012-04-23 22:19       ` djwong
  2012-04-24 21:57       ` Jan Kara
  2 siblings, 0 replies; 10+ messages in thread
From: Zheng Liu @ 2012-04-23  7:23 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Andreas Dilger, linux-ext4, linux-fsdevel

On Mon, Apr 23, 2012 at 01:24:39AM -0500, Andreas Dilger wrote:
> On 2012-04-22, at 21:25, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> 
> > On Fri, Apr 20, 2012 at 05:21:59AM -0600, Andreas Dilger wrote:
> >> 
> >> 
> >> The reason that there are two separate writes is because if the write
> >> of the commit block is reordered before the journal data, and only the
> >> commit block is written before a crash (data is lost), then the journal
> >> replay code may incorrectly think that the transaction is complete and
> >> copy the unwritten (garbage) block to the wrong place.
> >> 
> >> I think there is potentially an existing solution to this problem,
> >> which is the async journal commit feature.  It adds checksums to the
> >> journal commit block, which allows verifying that all blocks were
> >> written to disk properly even if the commit block is submitted at
> >> the same time as the journal data blocks.
> >> 
> >> One problem with this implementation is that if an intermediate
> >> journal commit has a data corruption (i.e. checksum of all data
> >> blocks does not match the commit block), then it is not possible
> >> to know which block(s) contain bad data.  After that, potentially
> >> many thousands of other operations may be lost.
> >> 
> >> We discussed a scheme to store a separate checksum for each block
> >> in a transaction, by storing a 16-bit checksum (likely the low
> >> 16 bits of CRC32c) into the high flags word for each block.  Then,
> >> if one or more blocks is corrupted, it is possible to skip replay
> >> of just those blocks, and potentially they will even be overwritten
> >> by blocks in a later transaction, requiring no e2fsck at all.
> > 
> > Thanks for pointing out this feature.  I have evaluated this feature in my
> > benchmark, and it can dramatically improve the performance. :-)
> > 
> > BTW, out of curiosity, why not set this feature on default?
> 
> As mentioned previously, one drawback of depending on the checksums for transaction commit is that if one block in and of the older transactions is bad, then this will cause the bad block's transaction to be aborted, along with all of the later transactions.
> 
> By skipping the replay of many transactions after reboot (some of which may have already written to the filesystem before the crash) this may leave the filesystem in a very inconsistent state. 
> 
> A better solution. (which has been discussed, but not implemented yet) is to write the checksum for each block in the transaction, and only skip restoring the block(s) with a good checksum in an otherwise complete transaction. 
> 
> This would need to change the journal disk format, but might be a good time to do this with Darrick's other checksum patches. 

Thanks for your explanation. :-)

Regards,
Zheng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jbd2: reduce the number of writes when commiting a transacation
  2012-04-23  6:24     ` Andreas Dilger
  2012-04-23  7:23       ` Zheng Liu
@ 2012-04-23 22:19       ` djwong
  2012-04-24 19:41         ` Ted Ts'o
  2012-04-24 21:57       ` Jan Kara
  2 siblings, 1 reply; 10+ messages in thread
From: djwong @ 2012-04-23 22:19 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Zheng Liu, Andreas Dilger, linux-ext4, linux-fsdevel

On Mon, Apr 23, 2012 at 01:24:39AM -0500, Andreas Dilger wrote:
> On 2012-04-22, at 21:25, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> 
> > On Fri, Apr 20, 2012 at 05:21:59AM -0600, Andreas Dilger wrote:
> >> 
> >> 
> >> The reason that there are two separate writes is because if the write
> >> of the commit block is reordered before the journal data, and only the
> >> commit block is written before a crash (data is lost), then the journal
> >> replay code may incorrectly think that the transaction is complete and
> >> copy the unwritten (garbage) block to the wrong place.
> >> 
> >> I think there is potentially an existing solution to this problem,
> >> which is the async journal commit feature.  It adds checksums to the
> >> journal commit block, which allows verifying that all blocks were
> >> written to disk properly even if the commit block is submitted at
> >> the same time as the journal data blocks.
> >> 
> >> One problem with this implementation is that if an intermediate
> >> journal commit has a data corruption (i.e. checksum of all data
> >> blocks does not match the commit block), then it is not possible
> >> to know which block(s) contain bad data.  After that, potentially
> >> many thousands of other operations may be lost.
> >> 
> >> We discussed a scheme to store a separate checksum for each block
> >> in a transaction, by storing a 16-bit checksum (likely the low
> >> 16 bits of CRC32c) into the high flags word for each block.  Then,
> >> if one or more blocks is corrupted, it is possible to skip replay
> >> of just those blocks, and potentially they will even be overwritten
> >> by blocks in a later transaction, requiring no e2fsck at all.
> > 
> > Thanks for pointing out this feature.  I have evaluated this feature in my
> > benchmark, and it can dramatically improve the performance. :-)
> > 
> > BTW, out of curiosity, why not set this feature on default?
> 
> As mentioned previously, one drawback of depending on the checksums for
> transaction commit is that if one block in and of the older transactions is
> bad, then this will cause the bad block's transaction to be aborted, along
> with all of the later transactions.
> 
> By skipping the replay of many transactions after reboot (some of which may
> have already written to the filesystem before the crash) this may leave the
> filesystem in a very inconsistent state. 
> 
> A better solution. (which has been discussed, but not implemented yet) is to
> write the checksum for each block in the transaction, and only skip restoring
> the block(s) with a good checksum in an otherwise complete transaction. 
> 
> This would need to change the journal disk format, but might be a good time
> to do this with Darrick's other checksum patches. 

My huge checksum patchset _does_ include checksums for data blocks; see the
t_checksum field in struct journal_block_tag_s.  iirc the corresponding journal
replay modifications will skip over corrupt data blocks and keep going.

--D
> 
> Cheers, Andreas--
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jbd2: reduce the number of writes when commiting a transacation
  2012-04-23 22:19       ` djwong
@ 2012-04-24 19:41         ` Ted Ts'o
  2012-04-25 20:34           ` djwong
  0 siblings, 1 reply; 10+ messages in thread
From: Ted Ts'o @ 2012-04-24 19:41 UTC (permalink / raw)
  To: djwong
  Cc: Andreas Dilger, Zheng Liu, Andreas Dilger, linux-ext4, linux-fsdevel

On Mon, Apr 23, 2012 at 10:19:48PM +0000, djwong wrote:
> 
> My huge checksum patchset _does_ include checksums for data blocks; see the
> t_checksum field in struct journal_block_tag_s.  iirc the corresponding journal
> replay modifications will skip over corrupt data blocks and keep going.

I need to check for this in the patches (and you may be doing this
already), but in the kernel failed checksums should result in an
ext4_error() call which will set the file system as corrupt and
needing to be checked.  And in e2fsck it should force a full check of
the file system.

					- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jbd2: reduce the number of writes when commiting a transacation
  2012-04-24 19:41         ` Ted Ts'o
@ 2012-04-25 20:34           ` djwong
  0 siblings, 0 replies; 10+ messages in thread
From: djwong @ 2012-04-25 20:34 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Zheng Liu, Andreas Dilger, linux-ext4, linux-fsdevel

On Tue, Apr 24, 2012 at 03:41:08PM -0400, Ted Ts'o wrote:
> On Mon, Apr 23, 2012 at 10:19:48PM +0000, djwong wrote:
> > 
> > My huge checksum patchset _does_ include checksums for data blocks; see the
> > t_checksum field in struct journal_block_tag_s.  iirc the corresponding journal
> > replay modifications will skip over corrupt data blocks and keep going.
> 
> I need to check for this in the patches (and you may be doing this
> already), but in the kernel failed checksums should result in an
> ext4_error() call which will set the file system as corrupt and
> needing to be checked.  And in e2fsck it should force a full check of
> the file system.

I'm fairly sure the kernel patch doesn't do that, I think it just skips the
block and moves on.

As for e2fsck... what does one do inside e2fsck to force a full check?  Is that
just the equivalent of passing -f?

--D
> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jbd2: reduce the number of writes when commiting a transacation
  2012-04-23  6:24     ` Andreas Dilger
  2012-04-23  7:23       ` Zheng Liu
  2012-04-23 22:19       ` djwong
@ 2012-04-24 21:57       ` Jan Kara
  2012-04-25  1:27         ` Ted Ts'o
  2 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2012-04-24 21:57 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Zheng Liu, Andreas Dilger, linux-ext4, linux-fsdevel

On Mon 23-04-12 01:24:39, Andreas Dilger wrote:
> On 2012-04-22, at 21:25, Zheng Liu <gnehzuil.liu@gmail.com> wrote:
> > On Fri, Apr 20, 2012 at 05:21:59AM -0600, Andreas Dilger wrote:
> >> 
> >> 
> >> The reason that there are two separate writes is because if the write
> >> of the commit block is reordered before the journal data, and only the
> >> commit block is written before a crash (data is lost), then the journal
> >> replay code may incorrectly think that the transaction is complete and
> >> copy the unwritten (garbage) block to the wrong place.
> >> 
> >> I think there is potentially an existing solution to this problem,
> >> which is the async journal commit feature.  It adds checksums to the
> >> journal commit block, which allows verifying that all blocks were
> >> written to disk properly even if the commit block is submitted at
> >> the same time as the journal data blocks.
> >> 
> >> One problem with this implementation is that if an intermediate
> >> journal commit has a data corruption (i.e. checksum of all data
> >> blocks does not match the commit block), then it is not possible
> >> to know which block(s) contain bad data.  After that, potentially
> >> many thousands of other operations may be lost.
> >> 
> >> We discussed a scheme to store a separate checksum for each block
> >> in a transaction, by storing a 16-bit checksum (likely the low
> >> 16 bits of CRC32c) into the high flags word for each block.  Then,
> >> if one or more blocks is corrupted, it is possible to skip replay
> >> of just those blocks, and potentially they will even be overwritten
> >> by blocks in a later transaction, requiring no e2fsck at all.
> > 
> > Thanks for pointing out this feature.  I have evaluated this feature in my
> > benchmark, and it can dramatically improve the performance. :-)
> > 
> > BTW, out of curiosity, why not set this feature on default?
> 
> As mentioned previously, one drawback of depending on the checksums for
> transaction commit is that if one block in and of the older transactions
> is bad, then this will cause the bad block's transaction to be aborted,
> along with all of the later transactions.
  Also currently the async commit code has essentially unfixable bugs in
handling of cache flushes as I wrote in
http://www.spinics.net/lists/linux-ext4/msg30452.html. Because data blocks
are not part of journal checksum, it can happen with async commit code that
data is not safely on disk although transaction is completely committed. So
async commit code isn't really safe to use unless you are fine with
exposure of uninitialized data...

								Honza

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jbd2: reduce the number of writes when commiting a transacation
  2012-04-24 21:57       ` Jan Kara
@ 2012-04-25  1:27         ` Ted Ts'o
  0 siblings, 0 replies; 10+ messages in thread
From: Ted Ts'o @ 2012-04-25  1:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andreas Dilger, Zheng Liu, Andreas Dilger, linux-ext4, linux-fsdevel

On Tue, Apr 24, 2012 at 11:57:09PM +0200, Jan Kara wrote:
>   Also currently the async commit code has essentially unfixable bugs in
> handling of cache flushes as I wrote in
> http://www.spinics.net/lists/linux-ext4/msg30452.html. Because data blocks
> are not part of journal checksum, it can happen with async commit code that
> data is not safely on disk although transaction is completely committed. So
> async commit code isn't really safe to use unless you are fine with
> exposure of uninitialized data...

With the old journal checksum, the data blocks *are* part of the
journal checksum.  That's not the reason I haven't enabled it as a
default (even though it would close to double fs_mark benchmarks).
The main issue is that e2fsck doesn't deal intelligently if some
commit *other* than the last one has a bad intelligent.

With the new journal checksum patches, each individual data block has
its own checksum, so we don't need to discard the entire commit;
instead we can just drop the individual block(s) that have a bad
checksum, and then force a full fsck run afterwards.

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-04-25 20:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-20 11:06 [RFC] jbd2: reduce the number of writes when commiting a transacation Zheng Liu
2012-04-20 11:21 ` Andreas Dilger
2012-04-23  2:25   ` Zheng Liu
2012-04-23  6:24     ` Andreas Dilger
2012-04-23  7:23       ` Zheng Liu
2012-04-23 22:19       ` djwong
2012-04-24 19:41         ` Ted Ts'o
2012-04-25 20:34           ` djwong
2012-04-24 21:57       ` Jan Kara
2012-04-25  1:27         ` Ted Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.