* Re: Ext3 behavior on power failure [not found] <A74E8B4A356D8143B79BBEB839421F3004023496@CORPUSMX20B.corp.emc.com> @ 2007-03-23 10:47 ` Ric Wheeler 2007-03-28 12:40 ` Jan Kara 0 siblings, 1 reply; 9+ messages in thread From: Ric Wheeler @ 2007-03-23 10:47 UTC (permalink / raw) To: armangau_philippe; +Cc: csar, linux-ext4, ext3-users armangau_philippe@emc.com wrote: > Hi all, > > We are building a new system which is going to use ext3 FS. We would like to know more about the behavior of ext3 in the case of failure. But before I procede, I would like to share more information about our future system. > > * Our application always does an fsync on files > * When symbolic links (more specifically fast symlink) are created, the host directory is also fsync'ed. > * Our application is also going to front an EMC disk array configured using RAID5 or RAID6. > * We will be using multipathing so that we can assume that no disk errors will be reported. > > In this context , we would like to know the following for recovery after a power outage: > > 1. When will an fsck have to be run (not counting the scheduled fsck every N-mounts)? > 2. In the case of a crash, are the fsync-ed file contents and symbolic links safe no matter what? > > Thanks, This is an interesting twist on some of the discussion that we have had at the recent workshop and in other forums on hardening file system in order to prevent the need to fsck. The twist is that we have a disk that will not lose power without being able to write to platter all of the data that has been sent - this is the case for most mid-range or higher disk arrays. If the application can precisely use fsync() on files, directories and symlinks, it wants to know that all objects are safe on disk that have completed a successful fsync. It also wants to know that the file system will not need any recovery beyond replaying transactions after a power outage/reboot - simply mount, let the transactions get replayed and you should be good to go without the fsck. The hard part of the question is to understand when and how often we will fail to deliver this easy case. Also, does any of the hardening in ext4 help here. Maybe the Stanford explode work/analysis sheds some light on this behavior? ric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext3 behavior on power failure 2007-03-23 10:47 ` Ext3 behavior on power failure Ric Wheeler @ 2007-03-28 12:40 ` Jan Kara 2007-03-28 13:17 ` John Anthony Kazos Jr. 2007-03-28 23:00 ` Ric Wheeler 0 siblings, 2 replies; 9+ messages in thread From: Jan Kara @ 2007-03-28 12:40 UTC (permalink / raw) To: Ric Wheeler; +Cc: armangau_philippe, ext3-users, linux-ext4, csar > armangau_philippe@emc.com wrote: > >Hi all, > > > >We are building a new system which is going to use ext3 FS. We would like > >to know more about the behavior of ext3 in the case of failure. But > >before I procede, I would like to share more information about our future > >system. > >* Our application always does an fsync on files > >* When symbolic links (more specifically fast symlink) are created, > >the host directory is also fsync'ed. * Our application is also > >going to front an EMC disk array configured using RAID5 or RAID6. > >* We will be using multipathing so that we can assume that no disk > >errors will be reported. > >In this context , we would like to know the following for recovery after a > >power outage: > > > >1. When will an fsck have to be run (not counting the scheduled fsck > >every N-mounts)? > >2. In the case of a crash, are the fsync-ed file contents and symbolic > >links safe no matter what? > > > >Thanks, > > This is an interesting twist on some of the discussion that we have had > at the recent workshop and in other forums on hardening file system in > order to prevent the need to fsck. > > The twist is that we have a disk that will not lose power without being > able to write to platter all of the data that has been sent - this is > the case for most mid-range or higher disk arrays. > > If the application can precisely use fsync() on files, directories and > symlinks, it wants to know that all objects are safe on disk that have > completed a successful fsync. It also wants to know that the file system > will not need any recovery beyond replaying transactions after a power > outage/reboot - simply mount, let the transactions get replayed and you > should be good to go without the fsck. > > The hard part of the question is to understand when and how often we > will fail to deliver this easy case. Also, does any of the hardening in > ext4 help here. I'm probably misunderstanding something because the answer seems to be too obvious to me :) But anyway I'll write it so that you can correct me: Due to journalling guarantees you should get consistent FS whenever you replay the log (unless there are some software bugs or hardware problems which is why fsck is run once per several mounts anyway). If you fsync() your data, you are guaranteed that also your data are safely on disk when fsync returns. So what is the question here? Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext3 behavior on power failure 2007-03-28 12:40 ` Jan Kara @ 2007-03-28 13:17 ` John Anthony Kazos Jr. 2007-03-28 13:29 ` Jan Kara ` (2 more replies) 2007-03-28 23:00 ` Ric Wheeler 1 sibling, 3 replies; 9+ messages in thread From: John Anthony Kazos Jr. @ 2007-03-28 13:17 UTC (permalink / raw) To: Jan Kara; +Cc: Ric Wheeler, armangau_philippe, ext3-users, linux-ext4, csar > If you fsync() your data, you are guaranteed that also your data are > safely on disk when fsync returns. So what is the question here? Pardon a newbie's intrusion, but I do know this isn't true. There is a window of possible loss because of the multitude of layers of caching, especially within the drive itself. Unless there is a super_duper_fsync() that is able to actually poll the hardware and get a confirmation that the internal buffers are purged? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext3 behavior on power failure 2007-03-28 13:17 ` John Anthony Kazos Jr. @ 2007-03-28 13:29 ` Jan Kara 2007-03-28 14:17 ` armangau_philippe 2007-04-18 21:49 ` Bruno Wolff III 2 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2007-03-28 13:29 UTC (permalink / raw) To: John Anthony Kazos Jr. Cc: Ric Wheeler, armangau_philippe, ext3-users, linux-ext4, csar > > If you fsync() your data, you are guaranteed that also your data are > >safely on disk when fsync returns. So what is the question here? > Pardon a newbie's intrusion, but I do know this isn't true. There is a > window of possible loss because of the multitude of layers of caching, > especially within the drive itself. Unless there is a super_duper_fsync() > that is able to actually poll the hardware and get a confirmation that the > internal buffers are purged? OK :), to correct myself: After fsync() returns, all the data is acked from the disk (or at least it should be like that unless there's a bug somewhere). So if there are some caches in the hardware which the hardware is not able to flush on power failure, that's a bad luck... That's why you should turn off write caching on cheaper disks if you really care about data integrity. Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Ext3 behavior on power failure 2007-03-28 13:17 ` John Anthony Kazos Jr. 2007-03-28 13:29 ` Jan Kara @ 2007-03-28 14:17 ` armangau_philippe 2007-03-28 15:00 ` Jan Kara 2007-04-18 21:49 ` Bruno Wolff III 2 siblings, 1 reply; 9+ messages in thread From: armangau_philippe @ 2007-03-28 14:17 UTC (permalink / raw) To: jakj, jack; +Cc: linux-ext4, ric, ext3-users, csar In my case the disk cache is not a problem - We use an emc disk array the write cache is protected - Once the data has made over the disk array we can assume it is safe - Thx Philippe -----Original Message----- From: John Anthony Kazos Jr. [mailto:jakj@j-a-k-j.com] Sent: Wednesday, March 28, 2007 9:17 AM To: Jan Kara Cc: wheeler, richard; armangau, philippe; ext3-users@redhat.com; linux-ext4@vger.kernel.org; csar@stanford.edu Subject: Re: Ext3 behavior on power failure > If you fsync() your data, you are guaranteed that also your data are > safely on disk when fsync returns. So what is the question here? Pardon a newbie's intrusion, but I do know this isn't true. There is a window of possible loss because of the multitude of layers of caching, especially within the drive itself. Unless there is a super_duper_fsync() that is able to actually poll the hardware and get a confirmation that the internal buffers are purged? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext3 behavior on power failure 2007-03-28 14:17 ` armangau_philippe @ 2007-03-28 15:00 ` Jan Kara 0 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2007-03-28 15:00 UTC (permalink / raw) To: armangau_philippe; +Cc: jakj, jack, ric, ext3-users, linux-ext4, csar On Wed 28-03-07 10:17:33, armangau_philippe@emc.com wrote: > In my case the disk cache is not a problem - We use an emc disk array > the write cache is protected - > Once the data has made over the disk array we can assume it is safe - Then if you are able to reproduce the situation that not all data is written after fsync(); poweroff; that is a bug worth reporting.. Honza > > -----Original Message----- > From: John Anthony Kazos Jr. [mailto:jakj@j-a-k-j.com] > Sent: Wednesday, March 28, 2007 9:17 AM > To: Jan Kara > Cc: wheeler, richard; armangau, philippe; ext3-users@redhat.com; > linux-ext4@vger.kernel.org; csar@stanford.edu > Subject: Re: Ext3 behavior on power failure > > > If you fsync() your data, you are guaranteed that also your data are > > safely on disk when fsync returns. So what is the question here? > > Pardon a newbie's intrusion, but I do know this isn't true. There is a > window of possible loss because of the multitude of layers of caching, > especially within the drive itself. Unless there is a > super_duper_fsync() > that is able to actually poll the hardware and get a confirmation that > the > internal buffers are purged? > -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext3 behavior on power failure 2007-03-28 13:17 ` John Anthony Kazos Jr. 2007-03-28 13:29 ` Jan Kara 2007-03-28 14:17 ` armangau_philippe @ 2007-04-18 21:49 ` Bruno Wolff III 2 siblings, 0 replies; 9+ messages in thread From: Bruno Wolff III @ 2007-04-18 21:49 UTC (permalink / raw) To: John Anthony Kazos Jr. Cc: csar, linux-ext4, Jan Kara, ext3-users, Ric Wheeler On Wed, Mar 28, 2007 at 09:17:27 -0400, "John Anthony Kazos Jr." <jakj@j-a-k-j.com> wrote: > > If you fsync() your data, you are guaranteed that also your data are > >safely on disk when fsync returns. So what is the question here? > > Pardon a newbie's intrusion, but I do know this isn't true. There is a > window of possible loss because of the multitude of layers of caching, > especially within the drive itself. Unless there is a super_duper_fsync() > that is able to actually poll the hardware and get a confirmation that the > internal buffers are purged? That is why you need to disable write caching of the drives or use cache flushes via write barriers (if the stack of block devices all support them) if the hardware cache isn't battery backed or the device doesn't support returning the status of particular commands. Of course nothing is perfectly safe. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext3 behavior on power failure 2007-03-28 12:40 ` Jan Kara 2007-03-28 13:17 ` John Anthony Kazos Jr. @ 2007-03-28 23:00 ` Ric Wheeler 2007-03-29 8:00 ` Jan Kara 1 sibling, 1 reply; 9+ messages in thread From: Ric Wheeler @ 2007-03-28 23:00 UTC (permalink / raw) To: Jan Kara; +Cc: csar, linux-ext4, ext3-users Jan Kara wrote: >> armangau_philippe@emc.com wrote: >>> Hi all, >>> >>> We are building a new system which is going to use ext3 FS. We would like >>> to know more about the behavior of ext3 in the case of failure. But >>> before I procede, I would like to share more information about our future >>> system. >>> * Our application always does an fsync on files >>> * When symbolic links (more specifically fast symlink) are created, >>> the host directory is also fsync'ed. * Our application is also >>> going to front an EMC disk array configured using RAID5 or RAID6. >>> * We will be using multipathing so that we can assume that no disk >>> errors will be reported. >>> In this context , we would like to know the following for recovery after a >>> power outage: >>> >>> 1. When will an fsck have to be run (not counting the scheduled fsck >>> every N-mounts)? >>> 2. In the case of a crash, are the fsync-ed file contents and symbolic >>> links safe no matter what? >>> >>> Thanks, >> This is an interesting twist on some of the discussion that we have had >> at the recent workshop and in other forums on hardening file system in >> order to prevent the need to fsck. >> >> The twist is that we have a disk that will not lose power without being >> able to write to platter all of the data that has been sent - this is >> the case for most mid-range or higher disk arrays. >> >> If the application can precisely use fsync() on files, directories and >> symlinks, it wants to know that all objects are safe on disk that have >> completed a successful fsync. It also wants to know that the file system >> will not need any recovery beyond replaying transactions after a power >> outage/reboot - simply mount, let the transactions get replayed and you >> should be good to go without the fsck. >> >> The hard part of the question is to understand when and how often we >> will fail to deliver this easy case. Also, does any of the hardening in >> ext4 help here. > I'm probably misunderstanding something because the answer seems to be > too obvious to me :) But anyway I'll write it so that you can correct > me: > Due to journalling guarantees you should get consistent FS whenever > you replay the log (unless there are some software bugs or hardware > problems which is why fsck is run once per several mounts anyway). > If you fsync() your data, you are guaranteed that also your data are > safely on disk when fsync returns. So what is the question here? > > Honza I think that the real question here is in practice, how often does this really hold to be true? When it fails, how long does it take to recover the file system? There are a lot of odd errors that can happen when you monitor a large enough number of file systems. In my experience, I would guess that disk errors are clearly the leading cause of issues, followed by software bugs (file system, firmware, etc) and then a group of errors caused by various occasional things (bad DRAM in the server/HBA/disk, bad cables/etc). Note that using a high end array does not eliminate errors, it just reduces the rate (hopefully by a large amount). What is really hard to predict is the rate of the failures that require fsck with our current file system (say for a specific hardware setup) and how changes like the checksumming in ext4 can help us ride through these errors without needing a full fsck. This rate has a direct impact on how much pain an fsck will inflict and how important redundancy is to avoid having the file system be a single point of failure. ric ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Ext3 behavior on power failure 2007-03-28 23:00 ` Ric Wheeler @ 2007-03-29 8:00 ` Jan Kara 0 siblings, 0 replies; 9+ messages in thread From: Jan Kara @ 2007-03-29 8:00 UTC (permalink / raw) To: Ric Wheeler; +Cc: csar, linux-ext4, ext3-users On Wed 28-03-07 19:00:54, Ric Wheeler wrote: > Jan Kara wrote: > >>armangau_philippe@emc.com wrote: > >>>Hi all, > >>> > >>>We are building a new system which is going to use ext3 FS. We would > >>>like to know more about the behavior of ext3 in the case of failure. > >>>But before I procede, I would like to share more information about our > >>>future system. > >>>* Our application always does an fsync on files > >>>* When symbolic links (more specifically fast symlink) are created, > >>>the host directory is also fsync'ed. * Our application is also > >>>going to front an EMC disk array configured using RAID5 or RAID6. > >>>* We will be using multipathing so that we can assume that no disk > >>>errors will be reported. > >>>In this context , we would like to know the following for recovery after > >>>a power outage: > >>> > >>>1. When will an fsck have to be run (not counting the scheduled fsck > >>>every N-mounts)? > >>>2. In the case of a crash, are the fsync-ed file contents and symbolic > >>>links safe no matter what? > >>> > >>>Thanks, > >>This is an interesting twist on some of the discussion that we have had > >>at the recent workshop and in other forums on hardening file system in > >>order to prevent the need to fsck. > >> > >>The twist is that we have a disk that will not lose power without being > >>able to write to platter all of the data that has been sent - this is > >>the case for most mid-range or higher disk arrays. > >> > >>If the application can precisely use fsync() on files, directories and > >>symlinks, it wants to know that all objects are safe on disk that have > >>completed a successful fsync. It also wants to know that the file system > >>will not need any recovery beyond replaying transactions after a power > >>outage/reboot - simply mount, let the transactions get replayed and you > >>should be good to go without the fsck. > >> > >>The hard part of the question is to understand when and how often we > >>will fail to deliver this easy case. Also, does any of the hardening in > >>ext4 help here. > > I'm probably misunderstanding something because the answer seems to be > >too obvious to me :) But anyway I'll write it so that you can correct > >me: > > Due to journalling guarantees you should get consistent FS whenever > >you replay the log (unless there are some software bugs or hardware > >problems which is why fsck is run once per several mounts anyway). > > If you fsync() your data, you are guaranteed that also your data are > >safely on disk when fsync returns. So what is the question here? > > > > Honza > > I think that the real question here is in practice, how often does this > really hold to be true? When it fails, how long does it take to recover the > file system? I see, thanks for explanation :). > There are a lot of odd errors that can happen when you monitor a large > enough number of file systems. In my experience, I would guess that disk > errors are clearly the leading cause of issues, followed by software bugs > (file system, firmware, etc) and then a group of errors caused by various > occasional things (bad DRAM in the server/HBA/disk, bad cables/etc). Note > that using a high end array does not eliminate errors, it just reduces the > rate (hopefully by a large amount). > > What is really hard to predict is the rate of the failures that require > fsck with our current file system (say for a specific hardware setup) and > how changes like the checksumming in ext4 can help us ride through these > errors without needing a full fsck. OK. All the features I've seen so far were more aiming to detecting that such an unexpected problem happened rather than trying to fix it or make fixing it faster. So currently it seems to me that any such unexpected failure requires fsck... Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-04-18 21:49 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <A74E8B4A356D8143B79BBEB839421F3004023496@CORPUSMX20B.corp.emc.com> 2007-03-23 10:47 ` Ext3 behavior on power failure Ric Wheeler 2007-03-28 12:40 ` Jan Kara 2007-03-28 13:17 ` John Anthony Kazos Jr. 2007-03-28 13:29 ` Jan Kara 2007-03-28 14:17 ` armangau_philippe 2007-03-28 15:00 ` Jan Kara 2007-04-18 21:49 ` Bruno Wolff III 2007-03-28 23:00 ` Ric Wheeler 2007-03-29 8:00 ` Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.