archive mirror
 help / color / mirror / Atom feed
From: "Stuart D. Gathman" <>
To: LVM general discussion and development <>
Subject: Re: [linux-lvm] Power loss consistency for RAID
Date: Mon, 18 Mar 2019 08:38:39 -0400 (EDT)	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Sun, 17 Mar 2019, Zheng Lv wrote:

> I'm recently considering using software RAID instead of hardware controllers 
> for my home server.
> AFAIK, write operation on a RAID array is not atomic across disks. I'm 
> concerned that what happens to RAID1/5/6/10 LVs after power loss.
> Is manual recovery required, or is it automatically checked and repaired on 
> LV activation?
> Also I'm curious about how such recovery works internally.

I use md raid1 and raid10.  I recommend that instead of the LVM RAID,
which is newer.  Create your RAID volumes with md, and add them as PVs:

   PV         VG      Fmt  Attr PSize   PFree
   /dev/md1   vg_span lvm2 a--u 214.81g      0
   /dev/md2   vg_span lvm2 a--u 214.81g  26.72g
   /dev/md3   vg_span lvm2 a--u 249.00g 148.00g
   /dev/md4   vg_span lvm2 a--u 252.47g 242.47g

Note that you do not need matching drives as with hardware RAID, you
can add disks and mix and match partitions of the same size on drives
of differing sizes.  LVM does this automatically, you have to manually
assign partitions to block devices with md.  There are very few (large)
partitions to assign, so it is a pleasant human sized exercise.

While striping and mirror schemes like raid0, raid1, raid10 are actually
faster with software RAID, I avoid RAID schemes with RMW cycles like
raid5 - you really need the hardware for those.

I use raid1 when the filesystem needs to be readable without the md 
driver - as with /boot.  Raid10 provides striping as well as mirroring,
with however many drives you have (I usually have 3 or 4).

Here is a brief overview of MD recovery and diagnostics.  Someone else
will have to fill in with the mechanics of LVM raid.

Md keeps a version in the superblock of each device in a logical md
drive - and marks the older leg as failed and replaced (and begins to
sync it).  In newer superblock formats, it also keeps a bit map so that it 
can sync only possibly modified areas.

Once a week (configurable), check_raid compares the legs (on most
distros).  If it encounters a read error on either drive, it immediately
syncs that block from the good drive.  This reassigns the sector on
modern drives.  (On ancient drives, a write error on resync marks the
drive as failed.) If for some reason (there are legitimate ones
involving write optimizations for SWAP volumes and such) the two legs do
not match, it arbitrarily copies one leg to the other, keeping a count.
(IMO it should also log the block offset so that I can occasionally check
that the out of sync occurred in an expected volume.)

 	      Stuart D. Gathman <>
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

      parent reply	other threads:[~2019-03-18 12:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-17 15:05 [linux-lvm] Power loss consistency for RAID Zheng Lv
2019-03-18  9:32 ` Roberto Fastec
2019-03-18  9:35 ` Roberto Fastec
2019-03-18 12:38 ` Stuart D. Gathman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).