All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Peter Rabbitson <rabbit+list@rabbit.us>,
	NeilBrown <neilb@suse.de>,
	Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>,
	Goswin von Brederlow <goswin-v-b@web.de>,
	Doug Ledford <dledford@redhat.com>,
	Michael Evans <mjevans1983@gmail.com>,
	Eyal Lebedinsky <eyal@eyal.emu.id.au>,
	linux-raid list <linux-raid@vger.kernel.org>
Subject: Re: mismatch_cnt again
Date: Thu, 12 Nov 2009 17:40:05 -0500	[thread overview]
Message-ID: <4AFC8EC5.6060400@tmr.com> (raw)
In-Reply-To: <yq1tyx2o4td.fsf@sermon.lab.mkp.net>

Martin K. Petersen wrote:
>>>>>> "Peter" == Peter Rabbitson <rabbit+list@rabbit.us> writes:
>>>>>>             
>
> Peter> Bingo - and according to the list archive many of us are getting
> Peter> mismatches without swap anywhere near the raid in question. The
> Peter> current situation is more akin to "Ok folks get in the plane,
> Peter> we're deploying in 2 hours, and btw your chute is not going to
> Peter> open and there is nothing you can do about it" How is that for a
> Peter> threat model :)
>
> Way back we used to lock pages down entirely for I/O submission.  At
> some point the writeback bit was introduced to gate the page during the
> actual (physical) write operation only.  That made locking trickier and
> not all filesystems correctly adapted to this.  ext[234] in particular
> have issues of varying degrees, somewhat amplified by their use of
> buffer_heads to track buffers instead of pages.  See the recent thread
> about corruption with ext4 in 2.6.32+ for examples of this.
>
> It's not just RAID consistency that breaks.  In the ext4 case above we
> end up with garbled blocks being written to a single drive.
>
> Add data integrity protection to the mix (btrfs, DIX) and all hell
> breaks loose if you change the buffer after the checksum has been
> generated.  So while modifying pages in flight has kinda-sorta worked
> for a while (i.e. the window of error is small) it's something we'll
> simply have to stop doing to support new features in the storage stack.
> You'll be glad to know there's discussion about merging the debug patch
> (which marks pages read-only during writeback) into ext4.
>
> FWIW, XFS and btrfs both use the page writeback bit correctly and never
> change a page while it is undergoing I/O.
>
>   
That's necessary but not sufficient. To be done correctly it must be 
protected by md as well. This is because arrays are used without a 
filesystem by some applications, such as swap and database, to name the 
most common cases. Data simply can't be correct on the drive if it is 
allowed to change between the write system call and arrival on the 
media, more so if a CRC or mirror is involved.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


  reply	other threads:[~2009-11-12 22:40 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-07  0:41 mismatch_cnt again Eyal Lebedinsky
2009-11-07  1:53 ` berk walker
2009-11-07  7:49   ` Eyal Lebedinsky
2009-11-07  8:08     ` Michael Evans
2009-11-07  8:42       ` Eyal Lebedinsky
2009-11-07 13:51       ` Goswin von Brederlow
2009-11-07 14:58         ` Doug Ledford
2009-11-07 16:23           ` Piergiorgio Sartor
2009-11-07 16:37             ` Doug Ledford
2009-11-07 22:25               ` Eyal Lebedinsky
2009-11-07 22:57                 ` Doug Ledford
2009-11-08 15:32             ` Goswin von Brederlow
2009-11-09 18:08               ` Bill Davidsen
2009-11-07 22:19           ` Eyal Lebedinsky
2009-11-07 22:58             ` Doug Ledford
2009-11-08 15:46           ` Goswin von Brederlow
2009-11-08 16:04             ` Piergiorgio Sartor
2009-11-09 18:22               ` Bill Davidsen
2009-11-09 21:50                 ` NeilBrown
2009-11-10 18:05                   ` Bill Davidsen
2009-11-10 22:17                     ` Peter Rabbitson
2009-11-13  2:15                     ` Neil Brown
2009-11-09 19:13               ` Goswin von Brederlow
2009-11-08 22:51             ` Peter Rabbitson
2009-11-09 18:56               ` Piergiorgio Sartor
2009-11-09 21:14                 ` NeilBrown
2009-11-09 21:54                   ` Piergiorgio Sartor
2009-11-10  0:17                     ` NeilBrown
2009-11-10  9:09                       ` Peter Rabbitson
2009-11-10 14:03                         ` Martin K. Petersen
2009-11-12 22:40                           ` Bill Davidsen [this message]
2009-11-13 17:12                             ` Martin K. Petersen
2009-11-14 17:01                               ` Bill Davidsen
2009-11-17  5:19                                 ` Martin K. Petersen
2009-11-14 19:04                               ` Goswin von Brederlow
2009-11-17  5:22                                 ` Martin K. Petersen
2009-11-10 19:52                       ` Piergiorgio Sartor
2009-11-13  2:37                         ` Neil Brown
2009-11-13  5:30                           ` Goswin von Brederlow
2009-11-13  9:33                           ` Peter Rabbitson
2009-11-15 21:05                           ` Piergiorgio Sartor
2009-11-15 22:29                             ` Guy Watkins
2009-11-16  1:23                               ` Goswin von Brederlow
2009-11-16  1:37                               ` Neil Brown
2009-11-16  5:21                                 ` Goswin von Brederlow
2009-11-16  5:35                                   ` Neil Brown
2009-11-16  7:40                                     ` Goswin von Brederlow
2009-11-12 22:57                       ` Bill Davidsen
2009-11-09 18:11           ` Bill Davidsen
2009-11-09 20:58             ` Doug Ledford
2009-11-09 22:03 ` Eyal Lebedinsky
2009-11-12 19:20 greg
2009-11-13  2:28 ` Neil Brown
2009-11-13  5:19   ` Goswin von Brederlow
2009-11-15  1:54   ` Bill Davidsen
2009-11-16 21:36 greg
2009-11-16 22:14 ` Neil Brown
2009-11-17  4:50   ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AFC8EC5.6060400@tmr.com \
    --to=davidsen@tmr.com \
    --cc=dledford@redhat.com \
    --cc=eyal@eyal.emu.id.au \
    --cc=goswin-v-b@web.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mjevans1983@gmail.com \
    --cc=neilb@suse.de \
    --cc=piergiorgio.sartor@nexgo.de \
    --cc=rabbit+list@rabbit.us \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.