linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] jbd: possible filesystem corruption fixes
@ 2008-04-18 13:29 Hidehiro Kawai
  0 siblings, 0 replies; 8+ messages in thread
From: Hidehiro Kawai @ 2008-04-18 13:29 UTC (permalink / raw)
  To: akpm, sct, adilger; +Cc: linux-kernel, linux-ext4, jack, sugita, Satoshi OSHIMA

Subject: [PATCH 0/4] jbd: possible filesystem corruption fixes

The current JBD is not sufficient for I/O error handling.  It can
cause filesystem corruption.   An example scenario:

1. fail to write a metadata buffer to block B in the journal
2. succeed to write the commit record
3. the system crashes, reboots and mount the filesystem
4. in the recovery phase, succeed to read data from block B
5. write back the read data to the filesystem, but it is a stale
   metadata
6. lose some files and directories!

This scenario is a rare case, but it (temporal I/O error)
can occur.  If we abort the journal between 1. and 2., this
tragedy can be avoided.

This patch set fixes several error handling problems to protect
from filesystem corruption caused by I/O errors.  It has been
done only for JBD and ext3 parts.

This patch is against 2.6.25

[PATCH 1/4] jbd: strictly check for write errors on data buffers
[PATCH 2/4] jbd: ordered data integrity fix
[PATCH 3/4] jbd: abort when failed to log metadata buffers
[PATCH 4/4] jbd: fix error handling for checkpoint io

Regards,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] jbd: possible filesystem corruption fixes
  2008-04-21 21:08     ` Andreas Dilger
@ 2008-04-23 12:45       ` Hidehiro Kawai
  0 siblings, 0 replies; 8+ messages in thread
From: Hidehiro Kawai @ 2008-04-23 12:45 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Mingming Cao, Josef Bacik, akpm, sct, adilger, linux-kernel,
	linux-ext4, jack, sugita, Satoshi OSHIMA

Andreas Dilger wrote:

> On Apr 18, 2008  12:26 -0700, Mingming Cao wrote:
> 
>>On Fri, 2008-04-18 at 10:09 -0400, Josef Bacik wrote:
>>
>>>On Fri, Apr 18, 2008 at 10:00:54PM +0900, Hidehiro Kawai wrote:
>>>
>>>>Subject: [PATCH 0/4] jbd: possible filesystem corruption fixes
>>>>
>>>>The current JBD is not sufficient for I/O error handling.  It can
>>>>cause filesystem corruption.   An example scenario:
>>>>
>>>>1. fail to write a metadata buffer to block B in the journal
>>>>2. succeed to write the commit record
>>>>3. the system crashes, reboots and mount the filesystem
>>>>4. in the recovery phase, succeed to read data from block B
>>>>5. write back the read data to the filesystem, but it is a stale
>>>>   metadata
>>>>6. lose some files and directories!
>>>>
>>>>This scenario is a rare case, but it (temporal I/O error)
>>>>can occur.  If we abort the journal between 1. and 2., this
>>>>tragedy can be avoided.
>>>>
>>>>This patch set fixes several error handling problems to protect
>>>>from filesystem corruption caused by I/O errors.  It has been
>>>>done only for JBD and ext3 parts.
>>
>>Could you sent Ext4/JBD2 version patches? Thanks!
> 
> 
> Actually, the journal checksum in ext4/jbd2 detects this kind of error,
> as well as errors that are NOT reported to the caller (e.g. media errors
> not reported to the kernel).

It's interesting feature.  I read the journal checksum patch,
it seems to fix the problem addressed by PATCH 3/4.
However, journal checksum feature is optional, so PATCH 3/4
will be needed as long as checksuming feature isn't turned
on always.

> One question is whether we want to _introduce_ a point of failure to the
> filesystem that may never actually cause a problem for the system,
> since the journal is only needed in the case of a crash.  By aborting
> the journal at this point instead of letting the checkpoint write the
> data to the filesystem then we are guaranteed a filesystem failure
> instead of "likely no problem at all".

I think it depends on the system and administrator.
When we failed to write metadata to the journal, we...

  (a) abort journaling
      - the filesystem can keep a consistent state if the system
        crashed
      - the system will stop because the filesystem becomes read-only
        state (default)
  (b) only do printk()
      - the system can continue to work
      - bad journalled data may break the file system if the system
        crashed

A user who demands high data integrity will choose (a), and
a user who demands high availability will choose (b).
We might want to enable the user to specify the behavior
on error such as the "errors" mount option.

 
> The journal checksum would detect the bad data in the transaction in the
> cases where it is important, and during operation it makes more sense
> to report the error via printk() so the administrator has some chance to
> do something about it.  There is no reason why the jbd2 change couldn't be
> merged back to jbd so ext3 could use the journal checksumming.  It is a
> "COMPAT" journal feature.

It's interesting.  For example, when a fsync operation is issued,
commit the current transaction, then read the journalled data of
that transaction to check the checksum.  If the bad data is detected,
flush the whole journal.  Aborting the journal will also make sense
because the journal space is errorneous.

Regards,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] jbd: possible filesystem corruption fixes
  2008-04-18 19:26   ` Mingming Cao
  2008-04-21 21:08     ` Andreas Dilger
@ 2008-04-23 11:01     ` Hidehiro Kawai
  1 sibling, 0 replies; 8+ messages in thread
From: Hidehiro Kawai @ 2008-04-23 11:01 UTC (permalink / raw)
  To: cmm
  Cc: Josef Bacik, akpm, sct, adilger, linux-kernel, linux-ext4, jack,
	sugita, Satoshi OSHIMA

Mingming Cao wrote:

>>>This patch set fixes several error handling problems to protect
>>>from filesystem corruption caused by I/O errors.  It has been
>>>done only for JBD and ext3 parts.
> 
> Could you sent Ext4/JBD2 version patches? Thanks!

I will try it, but I don't know I can send the Ext4/JBD2 version
within a reasonable time because I haven't read Ext4/JBD2 codes
so much yet.
 
>>There doesn't seem like much point in taking these patches as Jan is rewriting
>>the ordered mode path and most of these functions will be going away soon.
>>Those patches seem like they will be coming soon and will obsolete these.
> 
> I hope we have a better ordered mode very soon too. Just thought it's
> still valid to fix the current ordered mode for people who uses
> linux-2.6.25 kernel today. 

And older kernel users will be happy if someone or I make a backport
of this patch set.

Regards,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] jbd: possible filesystem corruption fixes
  2008-04-18 14:09 ` Josef Bacik
  2008-04-18 19:26   ` Mingming Cao
@ 2008-04-23 10:59   ` Hidehiro Kawai
  1 sibling, 0 replies; 8+ messages in thread
From: Hidehiro Kawai @ 2008-04-23 10:59 UTC (permalink / raw)
  To: Josef Bacik
  Cc: akpm, sct, adilger, linux-kernel, linux-ext4, jack, sugita,
	Satoshi OSHIMA

Josef Bacik wrote:

> On Fri, Apr 18, 2008 at 10:00:54PM +0900, Hidehiro Kawai wrote:
> 
>>Subject: [PATCH 0/4] jbd: possible filesystem corruption fixes
>>
>>This patch set fixes several error handling problems to protect
>>from filesystem corruption caused by I/O errors.  It has been
>>done only for JBD and ext3 parts.
>>
> There doesn't seem like much point in taking these patches as Jan is rewriting
> the ordered mode path and most of these functions will be going away soon.
> Those patches seem like they will be coming soon and will obsolete these.

Yes, PATCH 1/4 and PATCH 2/4 are specific to the ordered mode, and Jan's
patches seem to fix the same problems.  But the remain patches target
generic journaling problems, so I think those patches are still needed.

Regards,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] jbd: possible filesystem corruption fixes
  2008-04-18 19:26   ` Mingming Cao
@ 2008-04-21 21:08     ` Andreas Dilger
  2008-04-23 12:45       ` Hidehiro Kawai
  2008-04-23 11:01     ` Hidehiro Kawai
  1 sibling, 1 reply; 8+ messages in thread
From: Andreas Dilger @ 2008-04-21 21:08 UTC (permalink / raw)
  To: Mingming Cao
  Cc: Josef Bacik, Hidehiro Kawai, akpm, sct, adilger, linux-kernel,
	linux-ext4, jack, sugita, Satoshi OSHIMA

On Apr 18, 2008  12:26 -0700, Mingming Cao wrote:
> On Fri, 2008-04-18 at 10:09 -0400, Josef Bacik wrote:
> > On Fri, Apr 18, 2008 at 10:00:54PM +0900, Hidehiro Kawai wrote:
> > > Subject: [PATCH 0/4] jbd: possible filesystem corruption fixes
> > > 
> > > The current JBD is not sufficient for I/O error handling.  It can
> > > cause filesystem corruption.   An example scenario:
> > > 
> > > 1. fail to write a metadata buffer to block B in the journal
> > > 2. succeed to write the commit record
> > > 3. the system crashes, reboots and mount the filesystem
> > > 4. in the recovery phase, succeed to read data from block B
> > > 5. write back the read data to the filesystem, but it is a stale
> > >    metadata
> > > 6. lose some files and directories!
> > > 
> > > This scenario is a rare case, but it (temporal I/O error)
> > > can occur.  If we abort the journal between 1. and 2., this
> > > tragedy can be avoided.
> > > 
> > > This patch set fixes several error handling problems to protect
> > > from filesystem corruption caused by I/O errors.  It has been
> > > done only for JBD and ext3 parts.
> 
> Could you sent Ext4/JBD2 version patches? Thanks!

Actually, the journal checksum in ext4/jbd2 detects this kind of error,
as well as errors that are NOT reported to the caller (e.g. media errors
not reported to the kernel).

One question is whether we want to _introduce_ a point of failure to the
filesystem that may never actually cause a problem for the system,
since the journal is only needed in the case of a crash.  By aborting
the journal at this point instead of letting the checkpoint write the
data to the filesystem then we are guaranteed a filesystem failure
instead of "likely no problem at all".

The journal checksum would detect the bad data in the transaction in the
cases where it is important, and during operation it makes more sense
to report the error via printk() so the administrator has some chance to
do something about it.  There is no reason why the jbd2 change couldn't be
merged back to jbd so ext3 could use the journal checksumming.  It is a
"COMPAT" journal feature.

> > There doesn't seem like much point in taking these patches as Jan is rewriting
> > the ordered mode path and most of these functions will be going away soon.
> > Those patches seem like they will be coming soon and will obsolete these.
> 
> I hope we have a better ordered mode very soon too. Just thought it's
> still valid to fix the current ordered mode for people who uses
> linux-2.6.25 kernel today. 

I agree that we should at least report the errors to the syslog (if this
isn't happening already) so the admin knows there is a problem, and I also
agree that waiting for some future patch isn't a good reason to stop making
fixes to the current code.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] jbd: possible filesystem corruption fixes
  2008-04-18 14:09 ` Josef Bacik
@ 2008-04-18 19:26   ` Mingming Cao
  2008-04-21 21:08     ` Andreas Dilger
  2008-04-23 11:01     ` Hidehiro Kawai
  2008-04-23 10:59   ` Hidehiro Kawai
  1 sibling, 2 replies; 8+ messages in thread
From: Mingming Cao @ 2008-04-18 19:26 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Hidehiro Kawai, akpm, sct, adilger, linux-kernel, linux-ext4,
	jack, sugita, Satoshi OSHIMA

On Fri, 2008-04-18 at 10:09 -0400, Josef Bacik wrote:
> On Fri, Apr 18, 2008 at 10:00:54PM +0900, Hidehiro Kawai wrote:
> > Subject: [PATCH 0/4] jbd: possible filesystem corruption fixes
> > 
> > The current JBD is not sufficient for I/O error handling.  It can
> > cause filesystem corruption.   An example scenario:
> > 
> > 1. fail to write a metadata buffer to block B in the journal
> > 2. succeed to write the commit record
> > 3. the system crashes, reboots and mount the filesystem
> > 4. in the recovery phase, succeed to read data from block B
> > 5. write back the read data to the filesystem, but it is a stale
> >    metadata
> > 6. lose some files and directories!
> > 
> > This scenario is a rare case, but it (temporal I/O error)
> > can occur.  If we abort the journal between 1. and 2., this
> > tragedy can be avoided.
> > 
> > This patch set fixes several error handling problems to protect
> > from filesystem corruption caused by I/O errors.  It has been
> > done only for JBD and ext3 parts.
> >
> 

Could you sent Ext4/JBD2 version patches? Thanks!

> There doesn't seem like much point in taking these patches as Jan is rewriting
> the ordered mode path and most of these functions will be going away soon.
> Those patches seem like they will be coming soon and will obsolete these.
> 

I hope we have a better ordered mode very soon too. Just thought it's
still valid to fix the current ordered mode for people who uses
linux-2.6.25 kernel today. 

Mingming
> Josef 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4] jbd: possible filesystem corruption fixes
  2008-04-18 13:00 Hidehiro Kawai
@ 2008-04-18 14:09 ` Josef Bacik
  2008-04-18 19:26   ` Mingming Cao
  2008-04-23 10:59   ` Hidehiro Kawai
  0 siblings, 2 replies; 8+ messages in thread
From: Josef Bacik @ 2008-04-18 14:09 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: akpm, sct, adilger, linux-kernel, linux-ext4, jack, sugita,
	Satoshi OSHIMA

On Fri, Apr 18, 2008 at 10:00:54PM +0900, Hidehiro Kawai wrote:
> Subject: [PATCH 0/4] jbd: possible filesystem corruption fixes
> 
> The current JBD is not sufficient for I/O error handling.  It can
> cause filesystem corruption.   An example scenario:
> 
> 1. fail to write a metadata buffer to block B in the journal
> 2. succeed to write the commit record
> 3. the system crashes, reboots and mount the filesystem
> 4. in the recovery phase, succeed to read data from block B
> 5. write back the read data to the filesystem, but it is a stale
>    metadata
> 6. lose some files and directories!
> 
> This scenario is a rare case, but it (temporal I/O error)
> can occur.  If we abort the journal between 1. and 2., this
> tragedy can be avoided.
> 
> This patch set fixes several error handling problems to protect
> from filesystem corruption caused by I/O errors.  It has been
> done only for JBD and ext3 parts.
>

There doesn't seem like much point in taking these patches as Jan is rewriting
the ordered mode path and most of these functions will be going away soon.
Those patches seem like they will be coming soon and will obsolete these.

Josef 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 0/4] jbd: possible filesystem corruption fixes
@ 2008-04-18 13:00 Hidehiro Kawai
  2008-04-18 14:09 ` Josef Bacik
  0 siblings, 1 reply; 8+ messages in thread
From: Hidehiro Kawai @ 2008-04-18 13:00 UTC (permalink / raw)
  To: akpm, sct, adilger; +Cc: linux-kernel, linux-ext4, jack, sugita, Satoshi OSHIMA

Subject: [PATCH 0/4] jbd: possible filesystem corruption fixes

The current JBD is not sufficient for I/O error handling.  It can
cause filesystem corruption.   An example scenario:

1. fail to write a metadata buffer to block B in the journal
2. succeed to write the commit record
3. the system crashes, reboots and mount the filesystem
4. in the recovery phase, succeed to read data from block B
5. write back the read data to the filesystem, but it is a stale
   metadata
6. lose some files and directories!

This scenario is a rare case, but it (temporal I/O error)
can occur.  If we abort the journal between 1. and 2., this
tragedy can be avoided.

This patch set fixes several error handling problems to protect
from filesystem corruption caused by I/O errors.  It has been
done only for JBD and ext3 parts.

This patch is against 2.6.25

[PATCH 1/4] jbd: strictly check for write errors on data buffers
[PATCH 2/4] jbd: ordered data integrity fix
[PATCH 3/4] jbd: abort when failed to log metadata buffers
[PATCH 4/4] jbd: fix error handling for checkpoint io

Regards,
-- 
Hidehiro Kawai
Hitachi, Systems Development Laboratory
Linux Technology Center




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-04-23 12:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-18 13:29 [PATCH 0/4] jbd: possible filesystem corruption fixes Hidehiro Kawai
  -- strict thread matches above, loose matches on Subject: below --
2008-04-18 13:00 Hidehiro Kawai
2008-04-18 14:09 ` Josef Bacik
2008-04-18 19:26   ` Mingming Cao
2008-04-21 21:08     ` Andreas Dilger
2008-04-23 12:45       ` Hidehiro Kawai
2008-04-23 11:01     ` Hidehiro Kawai
2008-04-23 10:59   ` Hidehiro Kawai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).