All of lore.kernel.org
 help / color / mirror / Atom feed
* "Missing" RAID devices
@ 2013-05-21 12:51 Jim Santos
  2013-05-21 15:31 ` Phil Turmel
  2013-05-21 16:23 ` Doug Ledford
  0 siblings, 2 replies; 23+ messages in thread
From: Jim Santos @ 2013-05-21 12:51 UTC (permalink / raw)
  To: linux-raid

Hi,
I recently upgraded my 2 disk RAID-1 array from 1.5TB to 2TB disks.
When I started I had 10 MD devices. Since the last partition was
small, I removed the filesystems and deleted the associated RAID
device. The I created two new devices and split the extra 500 GB
between them. Everything was good until I rebooted. Now two of the
raid devices are 'gone'.

Here is what mdadm shows at the moment:

santos@bender:/etc/mdadm$ sudo mdadm --examine --scan

ARRAY /dev/md120 UUID=3fa7ec2e:7a19093c:34d348bd:5f81ddcb
ARRAY /dev/md119 UUID=302f5a8a:f963c76e:53433e04:4146b04d
ARRAY /dev/md125 UUID=a0535363:3799280a:1b1dd2b9:44680fa2
ARRAY /dev/md124 UUID=54742635:24ba2035:bcddc93e:73fc2c19
ARRAY /dev/md123 UUID=56613b81:36bc9bcd:fe538cc8:51817c6b
ARRAY /dev/md126 UUID=c5d4e3e5:ab520847:f5483f4e:4a7373f8
ARRAY /dev/md127 UUID=3983523a:740e09fa:84f2985e:c6521efa
ARRAY /dev/md121 UUID=35bd0dff:1fa423f5:f8fb6389:ecaefea8
ARRAY /dev/md122 UUID=ebbd009c:33fcc44c:8907793b:1d800ee1
ARRAY /dev/md/11 metadata=1.2 UUID=32bde3b1:b0475886:3ce4bba7:3bd12900
name=bender:11
ARRAY /dev/md/12 metadata=1.2 UUID=0257dbbd:42d8d666:2173709f:4dd0a1a6
name=bender:12

The last two devices are the new ones.


fstab:

UUID="57382674-3289-4cc8-a2d0-a57d44ea1458" /home ext3 defaults 2
UUID="478cca47-6f26-4550-9813-223d0d65b851" /mnt/part06 ext3 defaults 2
UUID="58c85e32-07d9-445d-b971-34d96c03a765" /mnt/part07 ext3 defaults 2
UUID="5d7d89cd-c04d-4916-bb9a-a28256d928a9" /mnt/part08 ext3 defaults 2
UUID="1da60cba-5f1d-4a8f-925a-6e3d80611b8d" /mnt/part09 ext3 defaults 2
UUID="241199c9-4b6a-49d6-a270-1db07691f99c" /mnt/part10 ext3 defaults
2 <-- Failed to mount
UUID="af7dc965-261b-4995-84ca-0ebb0c014efb" /mnt/part11 ext3 defaults
2 <-- Failed to mount
UUID="6a896a4e-aee0-41a8-8a29-2cb021f4149c" /mnt/part12 ext3 defaults 2
UUID="96379e3c-78f6-4cb4-984a-85be455263c3" /mnt/part13 ext3 defaults 2
UUID="2f8c7084-8610-4f11-97b8-5bc25ad9a7df" /mnt/part14 ext4 defaults 2
UUID="015fe797-e79c-4dd0-bbef-4c6c59287262" /mnt/part15 ext4 defaults 2

mount:

/dev/md120 on /home type ext3 (rw)
/dev/md119 on /mnt/part06 type ext3 (rw)
/dev/md125 on /mnt/part07 type ext3 (rw)
/dev/md124 on /mnt/part08 type ext3 (rw)
/dev/md123 on /mnt/part09 type ext3 (rw)
/dev/md121 on /mnt/part12 type ext3 (rw)
/dev/md122 on /mnt/part13 type ext3 (rw)
/dev/md126 on /mnt/part14 type ext4 (rw) <--- New
/dev/md127 on /mnt/part15 type ext4 (rw) <--- New


What seems to have happened is that /dev/md/11 and /dev/md/12 are now
named /dev/md126 and /dev/md127. If I examine a couple of the
partitions that should be part of /dev/md126 and /dev/md127, I see
this:

santos@bender:/etc/mdadm$ sudo mdadm --examine /dev/sda7

/dev/sda7:
Magic : a92b4efc
Version : 0.90.00
UUID : c5d4e3e5:ab520847:f5483f4e:4a7373f8
Creation Time : Fri Oct 30 12:30:04 2009
Raid Level : raid1
Used Dev Size : 157292288 (150.01 GiB 161.07 GB)
Array Size : 157292288 (150.01 GiB 161.07 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 126

Update Time : Mon May 20 15:00:19 2013
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : fff3c4a9 - correct
Events : 7003

Number Major Minor RaidDevice State
this 0 8 7 0 active sync /dev/sda7

0 0 8 7 0 active sync /dev/sda7
1 1 8 23 1 active sync /dev/sdb7

santos@bender:/etc/mdadm$ sudo mdadm --examine /dev/sda8

/dev/sda8:
Magic : a92b4efc
Version : 0.90.00
UUID : 3983523a:740e09fa:84f2985e:c6521efa
Creation Time : Fri Oct 30 12:30:21 2009
Raid Level : raid1
Used Dev Size : 157292288 (150.01 GiB 161.07 GB)
Array Size : 157292288 (150.01 GiB 161.07 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 127

Update Time : Mon May 20 15:00:19 2013
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 47e70f24 - correct
Events : 1665

Number Major Minor RaidDevice State
this 0 8 8 0 active sync /dev/sda8

0 0 8 8 0 active sync /dev/sda8
1 1 8 24 1 active sync /dev/sdb8

Notice the Preferred Minor numbers. It looks like the two new RAID
devices took over the devices with the highest minor numbers.

I don't know what I did to get into this situation, but could really
use some help getting out of it.

TIA

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-21 12:51 "Missing" RAID devices Jim Santos
@ 2013-05-21 15:31 ` Phil Turmel
  2013-05-21 22:22   ` Jim Santos
  2013-05-21 16:23 ` Doug Ledford
  1 sibling, 1 reply; 23+ messages in thread
From: Phil Turmel @ 2013-05-21 15:31 UTC (permalink / raw)
  To: Jim Santos; +Cc: linux-raid

Hi Jim,

On 05/21/2013 08:51 AM, Jim Santos wrote:
> Hi,
> I recently upgraded my 2 disk RAID-1 array from 1.5TB to 2TB disks.
> When I started I had 10 MD devices. Since the last partition was
> small, I removed the filesystems and deleted the associated RAID
> device. The I created two new devices and split the extra 500 GB
> between them. Everything was good until I rebooted. Now two of the
> raid devices are 'gone'.

[snip /]

> Notice the Preferred Minor numbers. It looks like the two new RAID
> devices took over the devices with the highest minor numbers.

Preferred minors are only used when assembling with kernel internal
auto-assembly (deprecated), which only works on meta-data v0.90, and
only if an initramfs is not present.  Boot-time assembly is otherwise
governed by the copy of mdadm.conf in your initramfs.

You appear to have failed to update your initramfs.  This is complicated
by your failure to avoid mdadm's "fallback" minor numbers that are used
when an array is assembled without an entry in mdadm.conf.

> I don't know what I did to get into this situation, but could really
> use some help getting out of it.

mdadm is called by modern initramfs boot scripts to assemble raid
devices as they are encountered.  If the given device is not a member of
an array listed in mdadm.conf, mdadm picks the next unused minor number
starting at 127 and counting down.  Mdadm must have found the members of
your new arrays before it found members of the arrays your old
mdadm.conf listed for md127 and md126.

Any time you update your mdadm.conf in your root filesystem, you must
remember to regenerate your initramfs so mdadm has the correct
information at boot time (before root is mounted).

To minimize future confusion, I recommend you renumber all of your
arrays in mdadm.conf starting with minor number 1.  Then update your
initramfs.  Then reboot.

Your fstab uses uuids (wisely), so you don't need any particular minor
numbers.  So you can and should avoid the minor numbers mdadm will use
as defaults.

HTH,

Phil

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-21 12:51 "Missing" RAID devices Jim Santos
  2013-05-21 15:31 ` Phil Turmel
@ 2013-05-21 16:23 ` Doug Ledford
  2013-05-21 17:03   ` Drew
  1 sibling, 1 reply; 23+ messages in thread
From: Doug Ledford @ 2013-05-21 16:23 UTC (permalink / raw)
  To: Jim Santos; +Cc: linux-raid

On 05/21/2013 08:51 AM, Jim Santos wrote:
> santos@bender:/etc/mdadm$ sudo mdadm --examine /dev/sda7
> 
> /dev/sda7:
> Magic : a92b4efc
> Version : 0.90.00
            ^^^^^^^

There's your problem.  Seriously.

> Preferred Minor : 126

So, how did you ever get version 0.90 superblocks that count from 127
backwards?  Mdadm doesn't do that.  In fact, mdadm relies on you *not*
doing that.

Here's the deal.  When you use version 0.90 superblocks, the number is
taken from the superminor field, usually starting at 0 and counting up,
and the device file is then /dev/md<number>.  With version 1.x
superblocks, we care about the name of the device, not the number, and
the name is taken from the name field of the superblock, and we create
the device as /dev/md/<name>.  However, when this support was added, the
kernel didn't support named elements (aka, you couldn't have a md/root
device in the kernel namespace, it needed to be md<number>), so the
/dev/md/<name> file is actually a symlink to a /dev/md<number> file, and
we would allocate from 127 and count backwards so that they would be as
unlikely as possible to conflict with numbered names from version 0.90
superblocks.

You are running into that impossible conflict.  I would remake all of
your version 0.90 raid arrays as version 1.0 raid arrays (the superblock
should sit in the exact same space and the arrays should be the same
size, but I can't say that for certain because newer mdadm might reserve
more space between the end of the filesystem and the superblock than
older mdadm did, so a test would be necessary first), and in the process
I would give them all names, and then I would totally eliminate all
references to /dev/md<number> in your system setup and stick with just
/dev/md/<name> for everything, and then I would remake your initrd and
be done with it all.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-21 16:23 ` Doug Ledford
@ 2013-05-21 17:03   ` Drew
       [not found]     ` <519BDC8C.1040202@hardwarefreak.com>
  0 siblings, 1 reply; 23+ messages in thread
From: Drew @ 2013-05-21 17:03 UTC (permalink / raw)
  To: Jim Santos; +Cc: linux-raid

Hi Jim,

The other question I'd ask is why do you have 10 raid1 arrays on those
two disks?

Given you have an initramfs, at most you should have separate
partitions (raid'd) for /boot & root. Everything else should be broken
down using LVM. Way more flexible to move things around in future as
required.


-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
       [not found]     ` <519BDC8C.1040202@hardwarefreak.com>
@ 2013-05-21 21:02       ` Drew
  2013-05-21 22:06         ` Stan Hoeppner
  0 siblings, 1 reply; 23+ messages in thread
From: Drew @ 2013-05-21 21:02 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Jim Santos, Linux RAID Mailing List

On Tue, May 21, 2013 at 1:43 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 5/21/2013 12:03 PM, Drew wrote:
>> Hi Jim,
>>
>> The other question I'd ask is why do you have 10 raid1 arrays on those
>> two disks?
>
> No joke.  That setup is ridiculous.  RAID exists to guard against a
> drive failure, not as a substitute for volume management.
>
>> Given you have an initramfs, at most you should have separate
>> partitions (raid'd) for /boot & root. Everything else should be broken
>> down using LVM. Way more flexible to move things around in future as
>> required.
>
> LVM isn't even required.  Using partitions (atop MD) or a single large
> filesystem (XFS) with quotas works just as well.

Agreed. For simple setups, a single boot & root is just fine.

I'd assumed the OP's reasons for using multiple partitions was valid,
so keeping those partitions over top a single raid array meant LVM was
the best choice.


-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-21 21:02       ` Drew
@ 2013-05-21 22:06         ` Stan Hoeppner
  0 siblings, 0 replies; 23+ messages in thread
From: Stan Hoeppner @ 2013-05-21 22:06 UTC (permalink / raw)
  To: Drew; +Cc: Jim Santos, Linux RAID Mailing List

On 5/21/2013 4:02 PM, Drew wrote:
> On Tue, May 21, 2013 at 1:43 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 5/21/2013 12:03 PM, Drew wrote:
>>> Hi Jim,
>>>
>>> The other question I'd ask is why do you have 10 raid1 arrays on those
>>> two disks?
>>
>> No joke.  That setup is ridiculous.  RAID exists to guard against a
>> drive failure, not as a substitute for volume management.
>>
>>> Given you have an initramfs, at most you should have separate
>>> partitions (raid'd) for /boot & root. Everything else should be broken
>>> down using LVM. Way more flexible to move things around in future as
>>> required.
>>
>> LVM isn't even required.  Using partitions (atop MD) or a single large
>> filesystem (XFS) with quotas works just as well.
> 
> Agreed. For simple setups, a single boot & root is just fine.
> 
> I'd assumed the OP's reasons for using multiple partitions was valid,
> so keeping those partitions over top a single raid array meant LVM was
> the best choice.

We don't have enough information yet to make such a determination.
Multiple LVM devices may most closely mimic his current setup, but that
doesn't mean it's the best choice.  It doesn't mean it's not either.  We
simply haven't been informed why he was using 10 md/RAID1 devices.  My
gut instinct says it's simply a lack of education, not a special
requirement.

The principal reason for such a setup is to prevent runaway processes
from filling the storage.  Thus /var which normally contains the logs
and mail spool is often put on a separate partition.  This problem can
also be addressed using filesystem quotas.

There is more than one way to skin a cat, as they say.  If he's using
these 10 partitions simply for organization purposes, then there's no
need for 10 LVM devices nor FS quotas on a single FS, but simply a good
directory hierarchy.

-- 
Stan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-21 15:31 ` Phil Turmel
@ 2013-05-21 22:22   ` Jim Santos
  2013-05-22  0:02     ` Phil Turmel
  0 siblings, 1 reply; 23+ messages in thread
From: Jim Santos @ 2013-05-21 22:22 UTC (permalink / raw)
  To: linux-raid

Hi,

Thanks for pointing out the initramfs problem.  I guess I should have
figured that out myself, since I've had to update initramfs in the
past, but it just totally slipped my mind.  And the strange device
numbering just threw me complete off track.

As far as how the devices got numbered that way in the first place, I
really don't know.  I assembled them and that is how it came out.
Since I was initially doing this to learn about SW RAID, I'm sure that
I made a rookie mistake or two along the way.

The reason that there are so many filesystems is that I wanted to try
to minimize any loss if one of them got corrupted.  Maybe it isn't the
best way to do it, but it made sense to me at the time.  I am more
than open to suggestions.

When I started doing this to better understand SW RAID, I wanted to
make things as simple as possible so I didn't use the LVM.  That and
it didn't seem like I would gain much by using it.  Al I need is
simple RAID1 devices I never planned on changing the layout other than
maybe increasing the size of the disks.  Maybe that flies in the face
of 'best practices', since you can be sure what your future needs
would be.  How would you suggest I set things up if I did use LVs?

/boot and / are on a separate disk on RAID1 devices with 1.x
superblocks.  At the moment, they are the only thing that aren't
giving me a problem :-)

Many thanks,

JIm




On Tue, May 21, 2013 at 11:31 AM, Phil Turmel <philip@turmel.org> wrote:
> Hi Jim,
>
> On 05/21/2013 08:51 AM, Jim Santos wrote:
>> Hi,
>> I recently upgraded my 2 disk RAID-1 array from 1.5TB to 2TB disks.
>> When I started I had 10 MD devices. Since the last partition was
>> small, I removed the filesystems and deleted the associated RAID
>> device. The I created two new devices and split the extra 500 GB
>> between them. Everything was good until I rebooted. Now two of the
>> raid devices are 'gone'.
>
> [snip /]
>
>> Notice the Preferred Minor numbers. It looks like the two new RAID
>> devices took over the devices with the highest minor numbers.
>
> Preferred minors are only used when assembling with kernel internal
> auto-assembly (deprecated), which only works on meta-data v0.90, and
> only if an initramfs is not present.  Boot-time assembly is otherwise
> governed by the copy of mdadm.conf in your initramfs.
>
> You appear to have failed to update your initramfs.  This is complicated
> by your failure to avoid mdadm's "fallback" minor numbers that are used
> when an array is assembled without an entry in mdadm.conf.
>
>> I don't know what I did to get into this situation, but could really
>> use some help getting out of it.
>
> mdadm is called by modern initramfs boot scripts to assemble raid
> devices as they are encountered.  If the given device is not a member of
> an array listed in mdadm.conf, mdadm picks the next unused minor number
> starting at 127 and counting down.  Mdadm must have found the members of
> your new arrays before it found members of the arrays your old
> mdadm.conf listed for md127 and md126.
>
> Any time you update your mdadm.conf in your root filesystem, you must
> remember to regenerate your initramfs so mdadm has the correct
> information at boot time (before root is mounted).
>
> To minimize future confusion, I recommend you renumber all of your
> arrays in mdadm.conf starting with minor number 1.  Then update your
> initramfs.  Then reboot.
>
> Your fstab uses uuids (wisely), so you don't need any particular minor
> numbers.  So you can and should avoid the minor numbers mdadm will use
> as defaults.
>
> HTH,
>
> Phil

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-21 22:22   ` Jim Santos
@ 2013-05-22  0:02     ` Phil Turmel
  2013-05-22  0:16       ` Jim Santos
  2013-05-22 22:43       ` Stan Hoeppner
  0 siblings, 2 replies; 23+ messages in thread
From: Phil Turmel @ 2013-05-22  0:02 UTC (permalink / raw)
  To: Jim Santos; +Cc: linux-raid

Hi Jim,

On 05/21/2013 06:22 PM, Jim Santos wrote:
> Hi,
> 
> Thanks for pointing out the initramfs problem.  I guess I should have
> figured that out myself, since I've had to update initramfs in the
> past, but it just totally slipped my mind.  And the strange device
> numbering just threw me complete off track.

Does this mean you're back to running?  Did you follow my instructions?

> As far as how the devices got numbered that way in the first place, I
> really don't know.  I assembled them and that is how it came out.
> Since I was initially doing this to learn about SW RAID, I'm sure that
> I made a rookie mistake or two along the way.

No problem.  You probably rebooted once between creating all your raids
and generating the mdadm.conf file.  (Using mdadm -Es >>/etc/mdadm.conf)

The reboot would have cause initramfs assembly without instructions,
using available minors starting at 127.  Then the --scan into mdadm.conf
would have "locked it in".

> The reason that there are so many filesystems is that I wanted to try
> to minimize any loss if one of them got corrupted.  Maybe it isn't the
> best way to do it, but it made sense to me at the time.  I am more
> than open to suggestions.
> 
> When I started doing this to better understand SW RAID, I wanted to
> make things as simple as possible so I didn't use the LVM.  That and
> it didn't seem like I would gain much by using it.  Al I need is
> simple RAID1 devices I never planned on changing the layout other than
> maybe increasing the size of the disks.  Maybe that flies in the face
> of 'best practices', since you can be sure what your future needs
> would be.  How would you suggest I set things up if I did use LVs?

Simple is good.  My preferred setup for light duty is two arrays spread
over all available disks.  First is /dev/md1, a small (~500m) n-way
mirror with v1.0 metadata for use as /boot.  The other, /dev/md2, uses
the balance of the disks in either raid10,far3 or raid6.  If raid6, I
use a chunk size of 16k.

I put LVM on top of /dev/md2, with LVs for swap, /, /home, /tmp, and
/bulk.  The latter is for photos, music, video, mythtv, et cetera.  I
generally leave 10% of the volume group unallocated until I see how the
usage patterns go.  LVM makes it easy to add space to existing LVs on
the run--even for the root filesystem.

LVM also makes it possible to move LVs from one array to another without
downtime.  This is especially handy when you have a root filesystem
inside a raid10.  (MD raid10 cannot be reshaped yet.)

Anyways, you asked my opinion.  I don't run any heavy duty systems, so
look to others for those situations.

> /boot and / are on a separate disk on RAID1 devices with 1.x
> superblocks.  At the moment, they are the only thing that aren't
> giving me a problem :-)

I guess that means the answers to my first questions are no?

Phil

ps.  The convention on kernel.org is to use reply-to-all, to trim
replies, and to either bottom-post or interleave.  FWIW.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-22  0:02     ` Phil Turmel
@ 2013-05-22  0:16       ` Jim Santos
  2013-05-22 22:43       ` Stan Hoeppner
  1 sibling, 0 replies; 23+ messages in thread
From: Jim Santos @ 2013-05-22  0:16 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On Tue, May 21, 2013 at 8:02 PM, Phil Turmel <philip@turmel.org> wrote:
>
> Does this mean you're back to running?  Did you follow my instructions?
>
I hadn't when I last posted, but by following your instructions I got
everything working
a few minutes ago.

Thanks for giving me a description of how you like to lay things out.

Jim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-22  0:02     ` Phil Turmel
  2013-05-22  0:16       ` Jim Santos
@ 2013-05-22 22:43       ` Stan Hoeppner
  2013-05-22 23:26         ` Phil Turmel
  1 sibling, 1 reply; 23+ messages in thread
From: Stan Hoeppner @ 2013-05-22 22:43 UTC (permalink / raw)
  To: Phil Turmel, Linux RAID

Sorry for the dup Phil, hit the wrong reply button.

On 5/21/2013 7:02 PM, Phil Turmel wrote:
...
> ...First is /dev/md1, a small (~500m) n-way
> ...as /boot.  The other, /dev/md2, uses
> ...raid10,far3 or raid6.
> 
> I put LVM on top of /dev/md2, with LVs for swap, ... /tmp

Swap and tmp atop an LV atop RAID6?  The former will always RMW on page
writes, the latter quite often will cause RMW.  As you stated your
performance requirements are modest.  However, for the archives, putting
swap on a parity array, let alone a double parity array, is not good
practice.

-- 
Stan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-22 22:43       ` Stan Hoeppner
@ 2013-05-22 23:26         ` Phil Turmel
  2013-05-23  5:59           ` Stan Hoeppner
  2013-05-23  8:22           ` David Brown
  0 siblings, 2 replies; 23+ messages in thread
From: Phil Turmel @ 2013-05-22 23:26 UTC (permalink / raw)
  To: stan; +Cc: Linux RAID

On 05/22/2013 06:43 PM, Stan Hoeppner wrote:
> Sorry for the dup Phil, hit the wrong reply button.

No worries.

> On 5/21/2013 7:02 PM, Phil Turmel wrote:
> ...
>> ...First is /dev/md1, a small (~500m) n-way
>> ...as /boot.  The other, /dev/md2, uses
>> ...raid10,far3 or raid6.
>>
>> I put LVM on top of /dev/md2, with LVs for swap, ... /tmp
> 
> Swap and tmp atop an LV atop RAID6?  The former will always RMW on page
> writes, the latter quite often will cause RMW.  As you stated your
> performance requirements are modest.  However, for the archives, putting
> swap on a parity array, let alone a double parity array, is not good
> practice.

Ah, good point.  Hasn't hurt me yet, but it would if I pushed anything
hard.  I'll have to revise my baseline to always have a small raid10,f3
to go with the raid6.

Meanwhile, I'm applying some of the general ideas I've seen from you:
I've acquired a pair of Crucial M4 SSDs for my new home media server to
keep small files and databases away from the bulk storage.  Not in
service yet, but I'm very pleased so far.

I'm pretty sure the new kit is way overkill for a media server... :-)

Thanks,

Phil


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-22 23:26         ` Phil Turmel
@ 2013-05-23  5:59           ` Stan Hoeppner
  2013-05-23  8:30             ` keld
  2013-05-23  8:22           ` David Brown
  1 sibling, 1 reply; 23+ messages in thread
From: Stan Hoeppner @ 2013-05-23  5:59 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Linux RAID

On 5/22/2013 6:26 PM, Phil Turmel wrote:
> On 05/22/2013 06:43 PM, Stan Hoeppner wrote:
>> Sorry for the dup Phil, hit the wrong reply button.
> 
> No worries.
> 
>> On 5/21/2013 7:02 PM, Phil Turmel wrote:
>> ...
>>> ...First is /dev/md1, a small (~500m) n-way
>>> ...as /boot.  The other, /dev/md2, uses
>>> ...raid10,far3 or raid6.
>>>
>>> I put LVM on top of /dev/md2, with LVs for swap, ... /tmp
>>
>> Swap and tmp atop an LV atop RAID6?  The former will always RMW on page
>> writes, the latter quite often will cause RMW.  As you stated your
>> performance requirements are modest.  However, for the archives, putting
>> swap on a parity array, let alone a double parity array, is not good
>> practice.
> 
> Ah, good point.  Hasn't hurt me yet, but it would if I pushed anything
> hard.  I'll have to revise my baseline to always have a small raid10,f3
> to go with the raid6.

Yeah, the kicker here is that swap on a parity array seems to work fine,
right up until the moment it doesn't.  And that's when the kernel goes
into heavy swapping due to any number of causes.  When that happens,
you're heavily into RMW, disk heads are bang'n, latency goes through the
roof.  If any programs are trying to access files on the parity array,
say a mildly busy IMAP, FTP, etc, server, everything grinds to a halt.

With your particular setup, instead you might use n additional
partitions, one each across the physical disks that comprise your n-way
RAID1.  You would configure the partition type of each as (82) Linux
swap, and add them all to fstab with equal priority.  The kernel will
interleave the 4KB swap page writes evenly across all of these
partitions, yielding swap performance similar to an n-way RAID0 stripe.

The downside to this setup is the kernel probably crashes if you lose
one of these disks and thus the swap partition on it.  So you could
simply make another md/RAID1 of these n partitions if n is an odd number
of spindles.  Or n/2 RAID1 arrays if n is even.  Then put one swap
partition on each RAID1 device and do swap interleaving across the RAID1
pairs as described above in the non RAID case.

The reason for this last configuration is simple-- more swap throughput
for the same number of physical writes.  With a 4 drive RAID1 and a
single swap partition atop, each 4KB page write to swap generates a 4KB
write to each of the 4 disks, 16KB total.  If you create two RAID1s and
put a swap partition on each and interleave them, each 4KB page write to
swap generates only two 4KB writes, 8KB total.  Here for each 16KB
written you're moving two pages to swap instead of one.  Thus your swap
bandwidth is doubled.  But you still have redundancy and crash avoidance
if one disk fails.  You may be tempted to use md/RAID10 of some layout
to optimize for writes, but you'd gain nothing, and you'd lose some
performance due to overhead.  The partitions you'll be using in this
case are so small that they easily fit in a single physical disk track,
thus no head movement is required to seek between sectors, only rotation
of the platter.

Another advantage to this hybrid approach is less disk space consumed.
If you need 8GB of swap, a 4-way RAID1 swap partition requires 32GB of
disk space, 8GB per disk.  With the n/2 RAID1 approach and 4 disks it
requires half that, 16GB.  With the no redundancy interleaved approach
it requires 1/4th, only 2GB per disk, 8GB total.  With today's
mechanical disk capacities this isn't a concern.  But if using SSDs it
can be.

> Meanwhile, I'm applying some of the general ideas I've seen from you:
> I've acquired a pair of Crucial M4 SSDs for my new home media server to
> keep small files and databases away from the bulk storage.  Not in
> service yet, but I'm very pleased so far.

If the two are competing for seeks thus slowing everything down, moving
the random access stuff to SSD should help.

> I'm pretty sure the new kit is way overkill for a media server... :-)

Not so many years ago folks would have said the same about 4TB mech
drives. ;)

-- 
Stan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-22 23:26         ` Phil Turmel
  2013-05-23  5:59           ` Stan Hoeppner
@ 2013-05-23  8:22           ` David Brown
  1 sibling, 0 replies; 23+ messages in thread
From: David Brown @ 2013-05-23  8:22 UTC (permalink / raw)
  To: Phil Turmel; +Cc: stan, Linux RAID

On 23/05/13 01:26, Phil Turmel wrote:
> On 05/22/2013 06:43 PM, Stan Hoeppner wrote:
>> Sorry for the dup Phil, hit the wrong reply button.
> 
> No worries.
> 
>> On 5/21/2013 7:02 PM, Phil Turmel wrote:
>> ...
>>> ...First is /dev/md1, a small (~500m) n-way
>>> ...as /boot.  The other, /dev/md2, uses
>>> ...raid10,far3 or raid6.
>>>
>>> I put LVM on top of /dev/md2, with LVs for swap, ... /tmp
>>
>> Swap and tmp atop an LV atop RAID6?  The former will always RMW on page
>> writes, the latter quite often will cause RMW.  As you stated your
>> performance requirements are modest.  However, for the archives, putting
>> swap on a parity array, let alone a double parity array, is not good
>> practice.
> 
> Ah, good point.  Hasn't hurt me yet, but it would if I pushed anything
> hard.  I'll have to revise my baseline to always have a small raid10,f3
> to go with the raid6.
> 

Always use raid1 (or raid10) for your swap - that is, assuming you want
it on raid at all.

Raid is all about uptime - look at what is likely to go wrong, what the
consequences of that problem are, and the cost of protecting against it.
 If your swap is seldom used (as is normally the case), and you can live
with the consequences of swap failing (i.e., any process using swap will
die - but everything else, including data on raid, will be fine), then
don't put swap on raid.  If it is cheaper to buy more ram than extra
spindles for swap on raid, then buy more ram and avoid swap.

But if you still feel you need swap on raid, then using raid1 styles.
Stan has given you all the details.


Also consider putting /tmp on tmpfs.  Then you don't need to worry about
raid here, and if it overflows to disk then the extra data goes out into
swap.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-23  5:59           ` Stan Hoeppner
@ 2013-05-23  8:30             ` keld
  2013-05-24  3:45               ` Stan Hoeppner
  0 siblings, 1 reply; 23+ messages in thread
From: keld @ 2013-05-23  8:30 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Phil Turmel, Linux RAID

On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:
> On 5/22/2013 6:26 PM, Phil Turmel wrote:
> > On 05/22/2013 06:43 PM, Stan Hoeppner wrote:
> >> Sorry for the dup Phil, hit the wrong reply button.
> > 
> > No worries.
> > 
> >> On 5/21/2013 7:02 PM, Phil Turmel wrote:
> >> ...
> >>> ...First is /dev/md1, a small (~500m) n-way
> >>> ...as /boot.  The other, /dev/md2, uses
> >>> ...raid10,far3 or raid6.
> >>>
> >>> I put LVM on top of /dev/md2, with LVs for swap, ... /tmp
> >>
> >> Swap and tmp atop an LV atop RAID6?  The former will always RMW on page
> >> writes, the latter quite often will cause RMW.  As you stated your
> >> performance requirements are modest.  However, for the archives, putting
> >> swap on a parity array, let alone a double parity array, is not good
> >> practice.
> > 
> > Ah, good point.  Hasn't hurt me yet, but it would if I pushed anything
> > hard.  I'll have to revise my baseline to always have a small raid10,f3
> > to go with the raid6.
> 
> Yeah, the kicker here is that swap on a parity array seems to work fine,
> right up until the moment it doesn't.  And that's when the kernel goes
> into heavy swapping due to any number of causes.  When that happens,
> you're heavily into RMW, disk heads are bang'n, latency goes through the
> roof.  If any programs are trying to access files on the parity array,
> say a mildly busy IMAP, FTP, etc, server, everything grinds to a halt.
> 
> With your particular setup, instead you might use n additional
> partitions, one each across the physical disks that comprise your n-way
> RAID1.  You would configure the partition type of each as (82) Linux
> swap, and add them all to fstab with equal priority.  The kernel will
> interleave the 4KB swap page writes evenly across all of these
> partitions, yielding swap performance similar to an n-way RAID0 stripe.
> 
> The downside to this setup is the kernel probably crashes if you lose
> one of these disks and thus the swap partition on it.  So you could
> simply make another md/RAID1 of these n partitions if n is an odd number
> of spindles.  Or n/2 RAID1 arrays if n is even.  Then put one swap
> partition on each RAID1 device and do swap interleaving across the RAID1
> pairs as described above in the non RAID case.
> 
> The reason for this last configuration is simple-- more swap throughput
> for the same number of physical writes.  With a 4 drive RAID1 and a
> single swap partition atop, each 4KB page write to swap generates a 4KB
> write to each of the 4 disks, 16KB total.  If you create two RAID1s and
> put a swap partition on each and interleave them, each 4KB page write to
> swap generates only two 4KB writes, 8KB total.  Here for each 16KB
> written you're moving two pages to swap instead of one.  Thus your swap
> bandwidth is doubled.  But you still have redundancy and crash avoidance
> if one disk fails.  You may be tempted to use md/RAID10 of some layout
> to optimize for writes, but you'd gain nothing, and you'd lose some
> performance due to overhead.  The partitions you'll be using in this
> case are so small that they easily fit in a single physical disk track,
> thus no head movement is required to seek between sectors, only rotation
> of the platter.
> 
> Another advantage to this hybrid approach is less disk space consumed.
> If you need 8GB of swap, a 4-way RAID1 swap partition requires 32GB of
> disk space, 8GB per disk.  With the n/2 RAID1 approach and 4 disks it
> requires half that, 16GB.  With the no redundancy interleaved approach
> it requires 1/4th, only 2GB per disk, 8GB total.  With today's
> mechanical disk capacities this isn't a concern.  But if using SSDs it
> can be.
> 
> > Meanwhile, I'm applying some of the general ideas I've seen from you:
> > I've acquired a pair of Crucial M4 SSDs for my new home media server to
> > keep small files and databases away from the bulk storage.  Not in
> > service yet, but I'm very pleased so far.
> 
> If the two are competing for seeks thus slowing everything down, moving
> the random access stuff to SSD should help.
> 
> > I'm pretty sure the new kit is way overkill for a media server... :-)
> 
> Not so many years ago folks would have said the same about 4TB mech
> drives. ;)

I think a raid10,far3 is a good choice for swap, then you will enjoy
RAID0-like reading speed. and good write speed (compared to raid6),
and a chance of live surviving if just one drive keeps functioning.

best regards
keld

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-23  8:30             ` keld
@ 2013-05-24  3:45               ` Stan Hoeppner
  2013-05-24  6:32                 ` keld
  0 siblings, 1 reply; 23+ messages in thread
From: Stan Hoeppner @ 2013-05-24  3:45 UTC (permalink / raw)
  To: keld; +Cc: Phil Turmel, Linux RAID

On 5/23/2013 3:30 AM, keld@keldix.com wrote:
> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:

>> You may be tempted to use md/RAID10 of some layout
>> to optimize for writes, but you'd gain nothing, and you'd lose some
>> performance due to overhead.  The partitions you'll be using in this
>> case are so small that they easily fit in a single physical disk track,
>> thus no head movement is required to seek between sectors, only rotation
>> of the platter.
...
> I think a raid10,far3 is a good choice for swap, then you will enjoy
> RAID0-like reading speed. and good write speed (compared to raid6),
> and a chance of live surviving if just one drive keeps functioning.

As I mention above, none of the md/RAID10 layouts will yield any added
performance benefit for swap partitions.  And I state the reason why.
If you think about this for a moment you should reach the same conclusion.

-- 
Stan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-24  3:45               ` Stan Hoeppner
@ 2013-05-24  6:32                 ` keld
  2013-05-24  7:37                   ` Stan Hoeppner
  2013-05-24  9:23                   ` David Brown
  0 siblings, 2 replies; 23+ messages in thread
From: keld @ 2013-05-24  6:32 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Phil Turmel, Linux RAID

On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote:
> On 5/23/2013 3:30 AM, keld@keldix.com wrote:
> > On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:
> 
> >> You may be tempted to use md/RAID10 of some layout
> >> to optimize for writes, but you'd gain nothing, and you'd lose some
> >> performance due to overhead.  The partitions you'll be using in this
> >> case are so small that they easily fit in a single physical disk track,
> >> thus no head movement is required to seek between sectors, only rotation
> >> of the platter.
> ...
> > I think a raid10,far3 is a good choice for swap, then you will enjoy
> > RAID0-like reading speed. and good write speed (compared to raid6),
> > and a chance of live surviving if just one drive keeps functioning.
> 
> As I mention above, none of the md/RAID10 layouts will yield any added
> performance benefit for swap partitions.  And I state the reason why.
> If you think about this for a moment you should reach the same conclusion.

I think it is you who are not fully aquainted with Linux MD. Linux 
MD RAID10,far3 offers improved performance in single read, which is an
advantage for swap, when you are swapping in. Thinkk about it and try it out for yourself.
Especially if we are talking 3 drives (far3), but also when you are
talking more drives and only 2 copies. You don't get raid0 read performance in Linux
on a combination of raid1 and raid0.

best regards
keld

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-24  6:32                 ` keld
@ 2013-05-24  7:37                   ` Stan Hoeppner
  2013-05-24 17:15                     ` keld
  2013-05-24  9:23                   ` David Brown
  1 sibling, 1 reply; 23+ messages in thread
From: Stan Hoeppner @ 2013-05-24  7:37 UTC (permalink / raw)
  To: keld; +Cc: Phil Turmel, Linux RAID

On 5/24/2013 1:32 AM, keld@keldix.com wrote:
> On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote:
>> On 5/23/2013 3:30 AM, keld@keldix.com wrote:
>>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:
>>
>>>> You may be tempted to use md/RAID10 of some layout
>>>> to optimize for writes, but you'd gain nothing, and you'd lose some
>>>> performance due to overhead.  The partitions you'll be using in this
>>>> case are so small that they easily fit in a single physical disk track,
>>>> thus no head movement is required to seek between sectors, only rotation
>>>> of the platter.
>> ...
>>> I think a raid10,far3 is a good choice for swap, then you will enjoy
>>> RAID0-like reading speed. and good write speed (compared to raid6),
>>> and a chance of live surviving if just one drive keeps functioning.
>>
>> As I mention above, none of the md/RAID10 layouts will yield any added
>> performance benefit for swap partitions.  And I state the reason why.
>> If you think about this for a moment you should reach the same conclusion.
> 
> I think it is you who are not fully aquainted with Linux MD. Linux 
> MD RAID10,far3 offers improved performance in single read, 

On most of today's systems, read performance is largely irrelevant WRT
swap performance.  However write performance is critical.  None of the
md/RAID10 layouts are going to increase write throughput over RAID1
pairs.  And all the mirrored RAIDs will be 2x slower than interleaved
swap across direct disk partitions.

-- 
Stan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-24  6:32                 ` keld
  2013-05-24  7:37                   ` Stan Hoeppner
@ 2013-05-24  9:23                   ` David Brown
  2013-05-24 18:03                     ` keld
  1 sibling, 1 reply; 23+ messages in thread
From: David Brown @ 2013-05-24  9:23 UTC (permalink / raw)
  To: keld; +Cc: Stan Hoeppner, Phil Turmel, Linux RAID

On 24/05/13 08:32, keld@keldix.com wrote:
> On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote:
>> On 5/23/2013 3:30 AM, keld@keldix.com wrote:
>>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:
>>
>>>> You may be tempted to use md/RAID10 of some layout
>>>> to optimize for writes, but you'd gain nothing, and you'd lose some
>>>> performance due to overhead.  The partitions you'll be using in this
>>>> case are so small that they easily fit in a single physical disk track,
>>>> thus no head movement is required to seek between sectors, only rotation
>>>> of the platter.
>> ...
>>> I think a raid10,far3 is a good choice for swap, then you will enjoy
>>> RAID0-like reading speed. and good write speed (compared to raid6),
>>> and a chance of live surviving if just one drive keeps functioning.
>>
>> As I mention above, none of the md/RAID10 layouts will yield any added
>> performance benefit for swap partitions.  And I state the reason why.
>> If you think about this for a moment you should reach the same conclusion.
> 
> I think it is you who are not fully aquainted with Linux MD. Linux 
> MD RAID10,far3 offers improved performance in single read, which is an
> advantage for swap, when you are swapping in. Thinkk about it and try it out for yourself.
> Especially if we are talking 3 drives (far3), but also when you are
> talking more drives and only 2 copies. You don't get raid0 read performance in Linux
> on a combination of raid1 and raid0.
> 

I think you are getting a number of things wrong here.  For general
usage, especially on a two disk system, raid10,f2 is very often an
excellent choice of setup - it gives you protection (two copies of
everything) and fastreads (you get striped read performance, and always
from the faster outer half of the disk).  You pay a higher write latency
compared to plain raid1, but with typical usage figures of 5 reads per
write, that's fine.  And normally you don't have to wait for writes to
finish anyway.

But swap is different in many ways.

First, the read/write ratio for swap is much closer to 1 - it can even
be lower than 1.  (Things like startup code for programs can get pushed
to swap and never read again, as can leaked memory from buggy programs.)

Secondly, write latency is a big factor - data is pushed to swap to free
up memory for other usage, and that has to wait until the write is complete.

Thirdly, the kernel will handle striping of multiple swap partitions
automatically.  And it will do it in a way that is optimal for swap
usage, rather than the chunk sizes used by a striped raid system.  (More
often, the kernel wants parallel access to different parts of swap,
rather than single large reads or writes.)


One thing that seems to be slightly confused here in this thread is the
mixup between the number of mirror copies and the number of drives in
raid10 setups.  With md raid, you can have as many mirrors as you like
over as many drives as you like, though you need at least as many
partitions as mirrors (and it seldom makes sense to have more mirrors
than drives).  For example, if you have 3 disks, you can use "far3"
layout to get three copies of your data - one copy on each disk.  But
you can also use "far2", and get two copies of your data.  See
<http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10>
for some pictures.

With plain raid1, if you use 3 drives you get three copies.

It seems unlikely to me that you would need the "safe against two disk
failure" protection of 3-way mirrors on swap, but it is possible.


Back to swap.

If you don't need protection for your swap (swap should not often be in
use, and a dead disk will lead to crashes on swapped-out processes but
should not cause more problems than that), put a small partition on each
disk, and add them all to swap.  The kernel will handle striping of the
swap partitions.  There is nothing you can do to make it faster.

When you want protection, raid1 is your best choice.  Make small
partitions on each disk, then pair them up as a number of raid1 pairs,
and add each of these as swap.  Your system will survive any disk
failure, or multiple failures as long as they are from different pairs.
 Again, there is nothing you can do to make it faster.


The important factor here is to minimise write latency.  You do that by
keeping the layers as simple as possible - raid1 is simpler and faster
than raid10 on two disks.  With small partitions, head movement and the
bandwidth differences between inner and outer tracks makes no
difference, so "far" layout is no benefit.

Theoretically, a set of raid10,f2 pairs rather than raid1 pairs would
allow faster reading of large chunks of swap - assuming, of course, that
the rest of the system supports such large I/O bandwidth.  But such
large streaming reads do not often happen with swap - more commonly, the
kernel will jump around in its accesses.  Large reads that use all
spindles are good for the throughput for large streamed reads, but they
also block all disks and increase the latency for random accesses which
are the common case for swap.


I'm a great fan of raid10,f2 - I think it is an optimal choice for many
uses, and shows a power and flexibility of Linux's md system that is
well above what you can get with hardware raid (or software raid on
other OS's).  But for swap, you want raid1.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-24  7:37                   ` Stan Hoeppner
@ 2013-05-24 17:15                     ` keld
  2013-05-24 19:05                       ` Stan Hoeppner
  0 siblings, 1 reply; 23+ messages in thread
From: keld @ 2013-05-24 17:15 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Phil Turmel, Linux RAID

On Fri, May 24, 2013 at 02:37:01AM -0500, Stan Hoeppner wrote:
> On 5/24/2013 1:32 AM, keld@keldix.com wrote:
> > On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote:
> >> On 5/23/2013 3:30 AM, keld@keldix.com wrote:
> >>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:
> >>
> >>>> You may be tempted to use md/RAID10 of some layout
> >>>> to optimize for writes, but you'd gain nothing, and you'd lose some
> >>>> performance due to overhead.  The partitions you'll be using in this
> >>>> case are so small that they easily fit in a single physical disk track,
> >>>> thus no head movement is required to seek between sectors, only rotation
> >>>> of the platter.
> >> ...
> >>> I think a raid10,far3 is a good choice for swap, then you will enjoy
> >>> RAID0-like reading speed. and good write speed (compared to raid6),
> >>> and a chance of live surviving if just one drive keeps functioning.
> >>
> >> As I mention above, none of the md/RAID10 layouts will yield any added
> >> performance benefit for swap partitions.  And I state the reason why.
> >> If you think about this for a moment you should reach the same conclusion.
> > 
> > I think it is you who are not fully aquainted with Linux MD. Linux 
> > MD RAID10,far3 offers improved performance in single read, 
> 
> On most of today's systems, read performance is largely irrelevant WRT
> swap performance.  However write performance is critical.  None of the
> md/RAID10 layouts are going to increase write throughput over RAID1
> pairs.  And all the mirrored RAIDs will be 2x slower than interleaved
> swap across direct disk partitions.

In my experience read performance from swap is critical, at least 
on single user systems. Eg swapping in firefox  or libreoffice 
may take quite some time and there raid10,far helps by almost halfing
the time for the swapping in. writes are not important, as long as you are not trashing.
In general halfing the swapping in with raid10,far is nice for a process, but 
for small processes it is not noticable for a laptop user or a 
server user, say http or ftp.

best regards
keld

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-24  9:23                   ` David Brown
@ 2013-05-24 18:03                     ` keld
  0 siblings, 0 replies; 23+ messages in thread
From: keld @ 2013-05-24 18:03 UTC (permalink / raw)
  To: David Brown; +Cc: Stan Hoeppner, Phil Turmel, Linux RAID

On Fri, May 24, 2013 at 11:23:30AM +0200, David Brown wrote:
> On 24/05/13 08:32, keld@keldix.com wrote:
> > On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote:
> >> On 5/23/2013 3:30 AM, keld@keldix.com wrote:
> >>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:
> >>
> >>>> You may be tempted to use md/RAID10 of some layout
> >>>> to optimize for writes, but you'd gain nothing, and you'd lose some
> >>>> performance due to overhead.  The partitions you'll be using in this
> >>>> case are so small that they easily fit in a single physical disk track,
> >>>> thus no head movement is required to seek between sectors, only rotation
> >>>> of the platter.
> >> ...
> >>> I think a raid10,far3 is a good choice for swap, then you will enjoy
> >>> RAID0-like reading speed. and good write speed (compared to raid6),
> >>> and a chance of live surviving if just one drive keeps functioning.
> >>
> >> As I mention above, none of the md/RAID10 layouts will yield any added
> >> performance benefit for swap partitions.  And I state the reason why.
> >> If you think about this for a moment you should reach the same conclusion.
> > 
> > I think it is you who are not fully aquainted with Linux MD. Linux 
> > MD RAID10,far3 offers improved performance in single read, which is an
> > advantage for swap, when you are swapping in. Thinkk about it and try it out for yourself.
> > Especially if we are talking 3 drives (far3), but also when you are
> > talking more drives and only 2 copies. You don't get raid0 read performance in Linux
> > on a combination of raid1 and raid0.
> > 
> 
> I think you are getting a number of things wrong here.  For general
> usage, especially on a two disk system, raid10,f2 is very often an
> excellent choice of setup - it gives you protection (two copies of
> everything) and fastreads (you get striped read performance, and always
> from the faster outer half of the disk).  You pay a higher write latency
> compared to plain raid1, but with typical usage figures of 5 reads per
> write, that's fine.  And normally you don't have to wait for writes to
> finish anyway.
> 
> But swap is different in many ways.
> 
> First, the read/write ratio for swap is much closer to 1 - it can even
> be lower than 1.  (Things like startup code for programs can get pushed
> to swap and never read again, as can leaked memory from buggy programs.)
> 
> Secondly, write latency is a big factor - data is pushed to swap to free
> up memory for other usage, and that has to wait until the write is complete.

Agreed

> Thirdly, the kernel will handle striping of multiple swap partitions
> automatically.  And it will do it in a way that is optimal for swap
> usage, rather than the chunk sizes used by a striped raid system.  (More
> often, the kernel wants parallel access to different parts of swap,
> rather than single large reads or writes.)

Yes, the kernel will handle striping, but not mirrored, if you do not employ raid.. 

> 
> One thing that seems to be slightly confused here in this thread is the
> mixup between the number of mirror copies and the number of drives in
> raid10 setups.  With md raid, you can have as many mirrors as you like
> over as many drives as you like, though you need at least as many
> partitions as mirrors (and it seldom makes sense to have more mirrors
> than drives).  For example, if you have 3 disks, you can use "far3"
> layout to get three copies of your data - one copy on each disk.  But
> you can also use "far2", and get two copies of your data.  See
> <http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10>
> for some pictures.
> 
> With plain raid1, if you use 3 drives you get three copies.
> 
> It seems unlikely to me that you would need the "safe against two disk
> failure" protection of 3-way mirrors on swap, but it is possible.

yes, it is possible, and why not do it, swap is mostly a very small
part of the total disk space, so that seems to be very cheap, and then also
giving identical disk layout in many situations.

> 
> 
> Back to swap.
> 
> If you don't need protection for your swap (swap should not often be in
> use, and a dead disk will lead to crashes on swapped-out processes but
> should not cause more problems than that), put a small partition on each
> disk, and add them all to swap.  The kernel will handle striping of the
> swap partitions.  There is nothing you can do to make it faster.

I think it is serious that a process, or a number of processes fail because of failing
disks. And it does not cost much disk space to prevent against this. It does
cost double/triple write IO, but that is probably worth it too.

I do think having a uniform disk space with raid0 reading property does
speed up the reading. The kernel cannot evenly spread IO over the disks,
as the chunks it needs to read may be different in size. raid10,far automatically
does this even spread. And if you need mirrored raid, then no other mirrored
raid types give you raid0 read speed.


> When you want protection, raid1 is your best choice.  Make small
> partitions on each disk, then pair them up as a number of raid1 pairs,
> and add each of these as swap.  Your system will survive any disk
> failure, or multiple failures as long as they are from different pairs.
>  Again, there is nothing you can do to make it faster.

Raid1 is only half as  fast as raid10,far for single reads..

> 
> The important factor here is to minimise write latency.  You do that by
> keeping the layers as simple as possible - raid1 is simpler and faster
> than raid10 on two disks.  With small partitions, head movement and the
> bandwidth differences between inner and outer tracks makes no
> difference, so "far" layout is no benefit.

The IO scheduling thakes care of latency problems, grouping the
right tracks together for the write tasks for the far layout.

yes for far layout and small partitions like swap, the difference
between the speed of inner and outer tracks are insignificant.

> Theoretically, a set of raid10,f2 pairs rather than raid1 pairs would
> allow faster reading of large chunks of swap - assuming, of course, that
> the rest of the system supports such large I/O bandwidth.  But such
> large streaming reads do not often happen with swap - more commonly, the
> kernel will jump around in its accesses.  Large reads that use all
> spindles are good for the throughput for large streamed reads, but they
> also block all disks and increase the latency for random accesses which
> are the common case for swap.

I have examples of large swaps like firefox and flash
> 
> I'm a great fan of raid10,f2 - I think it is an optimal choice for many
> uses, and shows a power and flexibility of Linux's md system that is
> well above what you can get with hardware raid (or software raid on
> other OS's).  But for swap, you want raid1.

raid1 and raid10,f2 are about the same for sequential write, which is what
is used for swap write io. Single read speed is far better for the far layout.
So why choose the slower raid1?
https://raid.wiki.kernel.org/index.php/Performance

best regards
keld

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-24 17:15                     ` keld
@ 2013-05-24 19:05                       ` Stan Hoeppner
  2013-05-24 19:22                         ` keld
  0 siblings, 1 reply; 23+ messages in thread
From: Stan Hoeppner @ 2013-05-24 19:05 UTC (permalink / raw)
  To: keld; +Cc: Phil Turmel, Linux RAID

On 5/24/2013 12:15 PM, keld@keldix.com wrote:
> On Fri, May 24, 2013 at 02:37:01AM -0500, Stan Hoeppner wrote:
>> On 5/24/2013 1:32 AM, keld@keldix.com wrote:
>>> On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote:
>>>> On 5/23/2013 3:30 AM, keld@keldix.com wrote:
>>>>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:
>>>>
>>>>>> You may be tempted to use md/RAID10 of some layout
>>>>>> to optimize for writes, but you'd gain nothing, and you'd lose some
>>>>>> performance due to overhead.  The partitions you'll be using in this
>>>>>> case are so small that they easily fit in a single physical disk track,
>>>>>> thus no head movement is required to seek between sectors, only rotation
>>>>>> of the platter.
>>>> ...
>>>>> I think a raid10,far3 is a good choice for swap, then you will enjoy
>>>>> RAID0-like reading speed. and good write speed (compared to raid6),
>>>>> and a chance of live surviving if just one drive keeps functioning.
>>>>
>>>> As I mention above, none of the md/RAID10 layouts will yield any added
>>>> performance benefit for swap partitions.  And I state the reason why.
>>>> If you think about this for a moment you should reach the same conclusion.
>>>
>>> I think it is you who are not fully aquainted with Linux MD. Linux 
>>> MD RAID10,far3 offers improved performance in single read, 
>>
>> On most of today's systems, read performance is largely irrelevant WRT
>> swap performance.  However write performance is critical.  None of the
>> md/RAID10 layouts are going to increase write throughput over RAID1
>> pairs.  And all the mirrored RAIDs will be 2x slower than interleaved
>> swap across direct disk partitions.
> 
> In my experience read performance from swap is critical, at least 
> on single user systems. Eg swapping in firefox  or libreoffice 
> may take quite some time and there raid10,far helps by almost halfing
> the time for the swapping in. writes are not important, as long as you are not trashing.

If a single user system has multiple drives configured in RAID10 and
productivity applications are being swapped, then the user should be
smacked in the head.  2GB DIMMs are $10.  Any hard drive is $50+ but
usually much more.

This is not a valid argument.

> In general halfing the swapping in with raid10,far is nice for a process, but 
> for small processes it is not noticable for a laptop user or a 
> server user, say http or ftp.

Neither is this.  Laptop users don't run RAID10.  And server swap
performance is all about page write, not read, as I previously stated.

-- 
Stan


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-24 19:05                       ` Stan Hoeppner
@ 2013-05-24 19:22                         ` keld
  2013-05-25  1:42                           ` Stan Hoeppner
  0 siblings, 1 reply; 23+ messages in thread
From: keld @ 2013-05-24 19:22 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Phil Turmel, Linux RAID

On Fri, May 24, 2013 at 02:05:44PM -0500, Stan Hoeppner wrote:
> On 5/24/2013 12:15 PM, keld@keldix.com wrote:
> > On Fri, May 24, 2013 at 02:37:01AM -0500, Stan Hoeppner wrote:
> >> On 5/24/2013 1:32 AM, keld@keldix.com wrote:
> >>> On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote:
> >>>> On 5/23/2013 3:30 AM, keld@keldix.com wrote:
> >>>>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:
> >>>>
> >>>>>> You may be tempted to use md/RAID10 of some layout
> >>>>>> to optimize for writes, but you'd gain nothing, and you'd lose some
> >>>>>> performance due to overhead.  The partitions you'll be using in this
> >>>>>> case are so small that they easily fit in a single physical disk track,
> >>>>>> thus no head movement is required to seek between sectors, only rotation
> >>>>>> of the platter.
> >>>> ...
> >>>>> I think a raid10,far3 is a good choice for swap, then you will enjoy
> >>>>> RAID0-like reading speed. and good write speed (compared to raid6),
> >>>>> and a chance of live surviving if just one drive keeps functioning.
> >>>>
> >>>> As I mention above, none of the md/RAID10 layouts will yield any added
> >>>> performance benefit for swap partitions.  And I state the reason why.
> >>>> If you think about this for a moment you should reach the same conclusion.
> >>>
> >>> I think it is you who are not fully aquainted with Linux MD. Linux 
> >>> MD RAID10,far3 offers improved performance in single read, 
> >>
> >> On most of today's systems, read performance is largely irrelevant WRT
> >> swap performance.  However write performance is critical.  None of the
> >> md/RAID10 layouts are going to increase write throughput over RAID1
> >> pairs.  And all the mirrored RAIDs will be 2x slower than interleaved
> >> swap across direct disk partitions.
> > 
> > In my experience read performance from swap is critical, at least 
> > on single user systems. Eg swapping in firefox  or libreoffice 
> > may take quite some time and there raid10,far helps by almost halfing
> > the time for the swapping in. writes are not important, as long as you are not trashing.
> 
> If a single user system has multiple drives configured in RAID10 and
> productivity applications are being swapped, then the user should be
> smacked in the head.  2GB DIMMs are $10.  Any hard drive is $50+ but
> usually much more.
> 
> This is not a valid argument.

And the cost to select, buy and install RAM is much more than USD 10.
And some systems dont have room for more RAM, etc.

> > In general halfing the swapping in with raid10,far is nice for a process, but 
> > for small processes it is not noticable for a laptop user or a 
> > server user, say http or ftp.
> 
> Neither is this.  Laptop users don't run RAID10.  And server swap
> performance is all about page write, not read, as I previously stated.

Some laptop users and desktop users run raid10. I think all laptop
and desktop users should have at least 2 disks and run mirrorred raid on them.
Both for security and for performance.

Server performance for properly configured servers will benefit from
snappy swap read performance. Write performance of swap would normally 
not be noticable - not a bottleneck.

best regards
keld

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: "Missing" RAID devices
  2013-05-24 19:22                         ` keld
@ 2013-05-25  1:42                           ` Stan Hoeppner
  0 siblings, 0 replies; 23+ messages in thread
From: Stan Hoeppner @ 2013-05-25  1:42 UTC (permalink / raw)
  To: keld; +Cc: Phil Turmel, Linux RAID

On 5/24/2013 2:22 PM, keld@keldix.com wrote:
> On Fri, May 24, 2013 at 02:05:44PM -0500, Stan Hoeppner wrote:
>> On 5/24/2013 12:15 PM, keld@keldix.com wrote:
>>> On Fri, May 24, 2013 at 02:37:01AM -0500, Stan Hoeppner wrote:
>>>> On 5/24/2013 1:32 AM, keld@keldix.com wrote:
>>>>> On Thu, May 23, 2013 at 10:45:56PM -0500, Stan Hoeppner wrote:
>>>>>> On 5/23/2013 3:30 AM, keld@keldix.com wrote:
>>>>>>> On Thu, May 23, 2013 at 12:59:39AM -0500, Stan Hoeppner wrote:
>>>>>>
>>>>>>>> You may be tempted to use md/RAID10 of some layout
>>>>>>>> to optimize for writes, but you'd gain nothing, and you'd lose some
>>>>>>>> performance due to overhead.  The partitions you'll be using in this
>>>>>>>> case are so small that they easily fit in a single physical disk track,
>>>>>>>> thus no head movement is required to seek between sectors, only rotation
>>>>>>>> of the platter.
>>>>>> ...
>>>>>>> I think a raid10,far3 is a good choice for swap, then you will enjoy
>>>>>>> RAID0-like reading speed. and good write speed (compared to raid6),
>>>>>>> and a chance of live surviving if just one drive keeps functioning.
>>>>>>
>>>>>> As I mention above, none of the md/RAID10 layouts will yield any added
>>>>>> performance benefit for swap partitions.  And I state the reason why.
>>>>>> If you think about this for a moment you should reach the same conclusion.
>>>>>
>>>>> I think it is you who are not fully aquainted with Linux MD. Linux 
>>>>> MD RAID10,far3 offers improved performance in single read, 
>>>>
>>>> On most of today's systems, read performance is largely irrelevant WRT
>>>> swap performance.  However write performance is critical.  None of the
>>>> md/RAID10 layouts are going to increase write throughput over RAID1
>>>> pairs.  And all the mirrored RAIDs will be 2x slower than interleaved
>>>> swap across direct disk partitions.
>>>
>>> In my experience read performance from swap is critical, at least 
>>> on single user systems. Eg swapping in firefox  or libreoffice 
>>> may take quite some time and there raid10,far helps by almost halfing
>>> the time for the swapping in. writes are not important, as long as you are not trashing.
>>
>> If a single user system has multiple drives configured in RAID10 and
>> productivity applications are being swapped, then the user should be
>> smacked in the head.  2GB DIMMs are $10.  Any hard drive is $50+ but
>> usually much more.
>>
>> This is not a valid argument.
> 
> And the cost to select, buy and install RAM is much more than USD 10.
> And some systems dont have room for more RAM, etc.

Yet another invalid, nonsensical argument...

>>> In general halfing the swapping in with raid10,far is nice for a process, but 
>>> for small processes it is not noticable for a laptop user or a 
>>> server user, say http or ftp.
>>
>> Neither is this.  Laptop users don't run RAID10.  And server swap
>> performance is all about page write, not read, as I previously stated.
> 
> Some laptop users and desktop users run raid10. 

And you might find a polar bear or two living in the tropics.

> I think all laptop
> and desktop users should have at least 2 disks and run mirrorred raid on them.
> Both for security and for performance.

The World According to Keld.  Most laptops can only house one drive.  If
they held two or more their on battery time would be useless.  Most
desktops are sold with only one drive.  Yes, in a perfect world everyone
would be redundant.

> Server performance for properly configured servers will benefit from
> snappy swap read performance. Write performance of swap would normally 
> not be noticable - not a bottleneck.

Properly configured servers don't swap.  When they need to swap it's
typically because something has gone wrong.  When this happens they need
to free pages as quickly as possible.  Once the problem no longer
exists, the speed with which pages are brought back from swap isn't
critical.

Please put down the shovel.  Your arguments keep digging you into a
deeper hole.  I'm not sure if you'll ever be able to climb out.

-- 
Stan


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-05-25  1:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-21 12:51 "Missing" RAID devices Jim Santos
2013-05-21 15:31 ` Phil Turmel
2013-05-21 22:22   ` Jim Santos
2013-05-22  0:02     ` Phil Turmel
2013-05-22  0:16       ` Jim Santos
2013-05-22 22:43       ` Stan Hoeppner
2013-05-22 23:26         ` Phil Turmel
2013-05-23  5:59           ` Stan Hoeppner
2013-05-23  8:30             ` keld
2013-05-24  3:45               ` Stan Hoeppner
2013-05-24  6:32                 ` keld
2013-05-24  7:37                   ` Stan Hoeppner
2013-05-24 17:15                     ` keld
2013-05-24 19:05                       ` Stan Hoeppner
2013-05-24 19:22                         ` keld
2013-05-25  1:42                           ` Stan Hoeppner
2013-05-24  9:23                   ` David Brown
2013-05-24 18:03                     ` keld
2013-05-23  8:22           ` David Brown
2013-05-21 16:23 ` Doug Ledford
2013-05-21 17:03   ` Drew
     [not found]     ` <519BDC8C.1040202@hardwarefreak.com>
2013-05-21 21:02       ` Drew
2013-05-21 22:06         ` Stan Hoeppner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.