Possible to change chunk size on RAID-1 without re-init or destructive result?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Possible to change chunk size on RAID-1 without re-init or destructive result?
@ 2013-03-27  5:30 Jeff Johnson
  2013-03-27  5:56 ` Mikael Abrahamsson
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Jeff Johnson @ 2013-03-27  5:30 UTC (permalink / raw)
  To: linux-raid

Greetings,

I have a RAID-1, two disk volume that was created with a strange chunk
size. For reasons I won't go into here removing the data and
redefining a new RAID-1 volume with a different chunk size is
presently not a viable option.

Since this is a two disk mirror I should, /*theoretically*/ be able to
redefine the chunk size without impacting existing data. I have never
attempted such a thing and am not sure how best to go about it. I know
in a striped data level like RAID-5 or RAID-6 this would absolutely
not be possible.

I know I would have to set a different chunk size in the superblock.
Google and wiki research do not show a clear method to do this.

Any suggestions are appreciated.

--Jeff

-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4170 Morena Boulevard, Suite D - San Diego, CA 92117

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27  5:30 Possible to change chunk size on RAID-1 without re-init or destructive result? Jeff Johnson
@ 2013-03-27  5:56 ` Mikael Abrahamsson
  2013-03-27  6:02 ` Roman Mamedov
  2013-03-27 16:01 ` Roy Sigurd Karlsbakk
  2 siblings, 0 replies; 20+ messages in thread
From: Mikael Abrahamsson @ 2013-03-27  5:56 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-raid

On Tue, 26 Mar 2013, Jeff Johnson wrote:

> I know I would have to set a different chunk size in the superblock. 
> Google and wiki research do not show a clear method to do this.

Google results for <mdadm change chunk size> result number 5 for me is 
<http://neil.brown.name/blog/20090817000931>. While this is a few years 
old, it says raid1 can't change chunk size, only RAID5 and RAID6 can 
change chunk size.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27  5:30 Possible to change chunk size on RAID-1 without re-init or destructive result? Jeff Johnson
  2013-03-27  5:56 ` Mikael Abrahamsson
@ 2013-03-27  6:02 ` Roman Mamedov
  2013-03-27 16:01 ` Roy Sigurd Karlsbakk
  2 siblings, 0 replies; 20+ messages in thread
From: Roman Mamedov @ 2013-03-27  6:02 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

On Tue, 26 Mar 2013 22:30:01 -0700
Jeff Johnson <jeff.johnson@aeoncomputing.com> wrote:

> I have a RAID-1, two disk volume that was created with a strange chunk
> size. For reasons I won't go into here removing the data and
> redefining a new RAID-1 volume with a different chunk size is
> presently not a viable option.

You can always fail and remove one member, make a new degraded RAID1 on it,
'dd' data from one RAID1 to the other, kill the old RAID1, add the remaining
drive to the new RAID1.

Do a full backup beforehand, of course.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27  5:30 Possible to change chunk size on RAID-1 without re-init or destructive result? Jeff Johnson
  2013-03-27  5:56 ` Mikael Abrahamsson
  2013-03-27  6:02 ` Roman Mamedov
@ 2013-03-27 16:01 ` Roy Sigurd Karlsbakk
  2013-03-27 16:23   ` Jeff Johnson
  2013-03-27 19:11   ` Stan Hoeppner
  2 siblings, 2 replies; 20+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-03-27 16:01 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-raid

From the manual

       -c, --chunk=
              Specify  chunk size of kibibytes.  The default when creating an array is 512KB.  To ensure compatibility with ear‐
              lier versions, the default when Building and array with no persistent metadata is 64KB.  This is  only  meaningful
              for RAID0, RAID4, RAID5, RAID6, and RAID10.

meaning - chunk size isn't relevant to a mirror

----- Opprinnelig melding -----
> Greetings,
> 
> I have a RAID-1, two disk volume that was created with a strange chunk
> size. For reasons I won't go into here removing the data and
> redefining a new RAID-1 volume with a different chunk size is
> presently not a viable option.
> 
> Since this is a two disk mirror I should, /*theoretically*/ be able to
> redefine the chunk size without impacting existing data. I have never
> attempted such a thing and am not sure how best to go about it. I know
> in a striped data level like RAID-5 or RAID-6 this would absolutely
> not be possible.
> 
> I know I would have to set a different chunk size in the superblock.
> Google and wiki research do not show a clear method to do this.
> 
> Any suggestions are appreciated.
> 
> --Jeff
> 
> 
> --
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
> 
> jeff.johnson@aeoncomputing.com
> www.aeoncomputing.com
> t: 858-412-3810 x101 f: 858-412-3845
> 
> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 16:01 ` Roy Sigurd Karlsbakk
@ 2013-03-27 16:23   ` Jeff Johnson
  2013-03-27 16:44     ` Roman Mamedov
  2013-03-27 19:36     ` Stan Hoeppner
  2013-03-27 19:11   ` Stan Hoeppner
  1 sibling, 2 replies; 20+ messages in thread
From: Jeff Johnson @ 2013-03-27 16:23 UTC (permalink / raw)
  To: linux-raid

And yet I have this output from /proc/mdstat:

md1 : active raid1 sdb3[1] sda3[0]
      288567164 blocks super 1.1 [2/2] [UU]
      bitmap: 3/3 pages [12KB], 65536KB chunk

It is very strange. the responsiveness on small file i/o tends to
support the notion that this mirror really has a 64MB chunk size. This
is practically an order of magnitude larger than what is prudent. The
iowait on simple things like a sync or writing out small files seems
to support what mdstat is reporting. Of course, I'd like to change
this but how to do so without breaking the RAID or risking data is not
obvious.

--Jeff



On Wed, Mar 27, 2013 at 9:01 AM, Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:
> From the manual
>
>        -c, --chunk=
>               Specify  chunk size of kibibytes.  The default when creating an array is 512KB.  To ensure compatibility with ear‐
>               lier versions, the default when Building and array with no persistent metadata is 64KB.  This is  only  meaningful
>               for RAID0, RAID4, RAID5, RAID6, and RAID10.
>
> meaning - chunk size isn't relevant to a mirror
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 16:23   ` Jeff Johnson
@ 2013-03-27 16:44     ` Roman Mamedov
  2013-03-27 19:36     ` Stan Hoeppner
  1 sibling, 0 replies; 20+ messages in thread
From: Roman Mamedov @ 2013-03-27 16:44 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

On Wed, 27 Mar 2013 09:23:52 -0700
Jeff Johnson <jeff.johnson@aeoncomputing.com> wrote:

> And yet I have this output from /proc/mdstat:
> 
> md1 : active raid1 sdb3[1] sda3[0]
>       288567164 blocks super 1.1 [2/2] [UU]
>       bitmap: 3/3 pages [12KB], 65536KB chunk
> 
> It is very strange. the responsiveness on small file i/o tends to
> support the notion that this mirror really has a 64MB chunk size. This
> is practically an order of magnitude larger than what is prudent. The
> iowait on simple things like a sync or writing out small files seems
> to support what mdstat is reporting. Of course, I'd like to change
> this but how to do so without breaking the RAID or risking data is not
> obvious.

This is the array _bitmap_ chunk size. In simple terms, it determines
granularity of array resyncs on unclean shutdowns.

You can change it by

  mdadm --grow /dev/md1 --bitmap=none

  mdadm --grow /dev/md1 --bitmap=internal --bitmap-chunk=131072

But the size you already have is okay, there is no need to change it, and I'd
say certainly no need to lower it (this will decrease performance).

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 16:01 ` Roy Sigurd Karlsbakk
  2013-03-27 16:23   ` Jeff Johnson
@ 2013-03-27 19:11   ` Stan Hoeppner
  2013-03-27 19:23     ` Mark Knecht
  1 sibling, 1 reply; 20+ messages in thread
From: Stan Hoeppner @ 2013-03-27 19:11 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Jeff Johnson, linux-raid

On 3/27/2013 11:01 AM, Roy Sigurd Karlsbakk wrote:
> From the manual
> 
>        -c, --chunk=
>               Specify  chunk size of kibibytes.  The default when creating an array is 512KB.  To ensure compatibility with ear‐
>               lier versions, the default when Building and array with no persistent metadata is 64KB.  This is  only  meaningful
>               for RAID0, RAID4, RAID5, RAID6, and RAID10.
> 
> meaning - chunk size isn't relevant to a mirror

The man page should be changed to explicitly state that --chunk does not
apply to RAID1 or --linear arrays.  Far too many people are having
trouble with this.  For those people the current docs are apparently too
subtle.

-- 
Stan

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 19:11   ` Stan Hoeppner
@ 2013-03-27 19:23     ` Mark Knecht
  2013-03-27 20:10       ` Stan Hoeppner
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Knecht @ 2013-03-27 19:23 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On Wed, Mar 27, 2013 at 12:11 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 3/27/2013 11:01 AM, Roy Sigurd Karlsbakk wrote:
>> From the manual
>>
>>        -c, --chunk=
>>               Specify  chunk size of kibibytes.  The default when creating an array is 512KB.  To ensure compatibility with ear‐
>>               lier versions, the default when Building and array with no persistent metadata is 64KB.  This is  only  meaningful
>>               for RAID0, RAID4, RAID5, RAID6, and RAID10.
>>
>> meaning - chunk size isn't relevant to a mirror
>
> The man page should be changed to explicitly state that --chunk does not
> apply to RAID1 or --linear arrays.  Far too many people are having
> trouble with this.  For those people the current docs are apparently too
> subtle.
>
> --
> Stan


'Subtle' is probably a good work. For me it took a few times reading
the man page after looking at what I have here where it tells me
nothing about chunk size on the RAID1.

Note that another level of understanding (which I don't have) has to
do with getting chunk sizes that work well for my needs. That's a
whole other kettle of fish...

Cheers,
Mark

mark@c2stable ~ $ cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md6 : active raid5 sdc6[1] sdd6[2] sdb6[0]
      494833664 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 0/2 pages [0KB], 65536KB chunk

md3 : active raid6 sdd3[2] sdc3[1] sdb3[0] sde3[3] sdf3[5]
      157305168 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU]

md7 : active raid6 sdd7[2] sdc7[1] sdb7[0] sde2[3] sdf2[4]
      395387904 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU]

md126 : active raid1 sdb5[0] sdd5[2] sdc5[1]
      52436032 blocks [3/3] [UUU]

unused devices: <none>
mark@c2stable ~ $
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 16:23   ` Jeff Johnson
  2013-03-27 16:44     ` Roman Mamedov
@ 2013-03-27 19:36     ` Stan Hoeppner
  1 sibling, 0 replies; 20+ messages in thread
From: Stan Hoeppner @ 2013-03-27 19:36 UTC (permalink / raw)
  To: Jeff Johnson; +Cc: linux-raid

On 3/27/2013 11:23 AM, Jeff Johnson wrote:

>       bitmap: 3/3 pages [12KB], 65536KB chunk

The above has ZERO to do with what you report below.

> It is very strange. the responsiveness on small file i/o tends to
> support the notion that this mirror really has a 64MB chunk size. This

No it doesn't.  You lack understanding of how disk mirroring works.

> is practically an order of magnitude larger than what is prudent. The
> iowait on simple things like a sync or writing out small files seems
> to support what mdstat is reporting. Of course, I'd like to change
> this but how to do so without breaking the RAID or risking data is not
> obvious.

The IO latency you describe is unrelated to md.  It is most often caused
by filesystem free space fragmentation.  How full is the filesystem in
question?  If over 90 percent odds are good that this is your problem.
If it's an XFS filesystem I can give you commands to show the free space
map, which will answer the question definitively.  If EXTx you're on
your own as I don't know the tools (if they exist).

This IO latency can also be caused if hardware (disk/controller) is
malfunctioning.  Look at 'smartctl -A' output for both drives.

-- 
Stan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 19:23     ` Mark Knecht
@ 2013-03-27 20:10       ` Stan Hoeppner
  2013-03-27 21:06         ` Mark Knecht
  0 siblings, 1 reply; 20+ messages in thread
From: Stan Hoeppner @ 2013-03-27 20:10 UTC (permalink / raw)
  To: Mark Knecht; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On 3/27/2013 2:23 PM, Mark Knecht wrote:

> Note that another level of understanding (which I don't have) has to
> do with getting chunk sizes that work well for my needs. That's a
> whole other kettle of fish...
...
> mark@c2stable ~ $ cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md6 : active raid5 sdc6[1] sdd6[2] sdb6[0]
>       494833664 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>       bitmap: 0/2 pages [0KB], 65536KB chunk
> 
> md3 : active raid6 sdd3[2] sdc3[1] sdb3[0] sde3[3] sdf3[5]
>       157305168 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU]
> 
> md7 : active raid6 sdd7[2] sdc7[1] sdb7[0] sde2[3] sdf2[4]
>       395387904 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU]
> 
> md126 : active raid1 sdb5[0] sdd5[2] sdc5[1]
>       52436032 blocks [3/3] [UUU]

Your problem isn't chunk sizes, but likely the 4 md/RAID devices atop
the same set of physical disks.  If you have workloads that are
accessing these md devices concurrently that will tend to wreak havoc
WRT readahead, the elevator, and thus the disk head actuators.  If these
are low RPM 'green' drives it will be exacerbated due to the slow
spindle speed.

The purpose of RAID is to prevent data loss when a drive fails.  The
purpose of striped RAID is to add performance atop that.  Thus you
normally have one RAID per set of physical disks.  The Linux md/RAID
driver allows you to stack multiple RAIDs atop one set of disks, thus
shooting yourself in the foot.  Look at any hardware RAID card, SAN
controller, etc, and none of them allow this--only one RAID per disk set.

At this point you obviously don't won't to blow away your current setup,
create one array and restore, as you probably don't have backups.
Reshaping with different chunk sizes won't gain you anything either.  So
about the only things you can optimize at this point are your elevator
and disk settings such as nr_requests and read_ahead_kb.  Switching from
CFQ to deadline could help quite a lot.

-- 
Stan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 20:10       ` Stan Hoeppner
@ 2013-03-27 21:06         ` Mark Knecht
  2013-03-27 22:08           ` Stan Hoeppner
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Knecht @ 2013-03-27 21:06 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On Wed, Mar 27, 2013 at 1:10 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 3/27/2013 2:23 PM, Mark Knecht wrote:
>
>> Note that another level of understanding (which I don't have) has to
>> do with getting chunk sizes that work well for my needs. That's a
>> whole other kettle of fish...
> ...
>> mark@c2stable ~ $ cat /proc/mdstat
>> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
>> md6 : active raid5 sdc6[1] sdd6[2] sdb6[0]
>>       494833664 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>       bitmap: 0/2 pages [0KB], 65536KB chunk
>>
>> md3 : active raid6 sdd3[2] sdc3[1] sdb3[0] sde3[3] sdf3[5]
>>       157305168 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU]
>>
>> md7 : active raid6 sdd7[2] sdc7[1] sdb7[0] sde2[3] sdf2[4]
>>       395387904 blocks super 1.2 level 6, 16k chunk, algorithm 2 [5/5] [UUUUU]
>>
>> md126 : active raid1 sdb5[0] sdd5[2] sdc5[1]
>>       52436032 blocks [3/3] [UUU]
>
> Your problem isn't chunk sizes, but likely the 4 md/RAID devices atop
> the same set of physical disks.  If you have workloads that are
> accessing these md devices concurrently that will tend to wreak havoc
> WRT readahead, the elevator, and thus the disk head actuators.  If these
> are low RPM 'green' drives it will be exacerbated due to the slow
> spindle speed.
>

The drives are WD RE3 so at least I have that in my favor.

I started learning these lessons about multiple RAIDs on one set of
physical disks after building the machine. The plan I'm moving _very_
slowly toward is migrating from the md126 RAID1 which is currently
root to the md3 RAID6. I've built a new, bootable Gentoo install on
the RAID6. It's up and running and basically I think I just need to
move my user account and the stuff in /home and I'm there. With that
done md126 is gone.

md7 is manageable. It's all Virtualbox VMs which I backup externally
every week so I can do a backup of that, delete md126 & md7 and then
(hopefully) resize md3 larger.

md6 isn't used much. I mount it, do quick backups to it, and then
unmount. It's used about once a day and not in use most of the time. I
could probably get rid of it completely but I'd want another external
drive to replace it. Anyway, if's not overly important one way or
another.

All that said, I still don't really know if I was starting over today
how to choose a new chunk size. That still eludes me. I've sort of
decided that's one of those things that make you guys pros and me just
a user. :-)

Cheers,
Mark

> The purpose of RAID is to prevent data loss when a drive fails.  The
> purpose of striped RAID is to add performance atop that.  Thus you
> normally have one RAID per set of physical disks.  The Linux md/RAID
> driver allows you to stack multiple RAIDs atop one set of disks, thus
> shooting yourself in the foot.  Look at any hardware RAID card, SAN
> controller, etc, and none of them allow this--only one RAID per disk set.
>
> At this point you obviously don't won't to blow away your current setup,
> create one array and restore, as you probably don't have backups.
> Reshaping with different chunk sizes won't gain you anything either.  So
> about the only things you can optimize at this point are your elevator
> and disk settings such as nr_requests and read_ahead_kb.  Switching from
> CFQ to deadline could help quite a lot.
>
> --
> Stan
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 21:06         ` Mark Knecht
@ 2013-03-27 22:08           ` Stan Hoeppner
  2013-03-27 22:18             ` Mark Knecht
  0 siblings, 1 reply; 20+ messages in thread
From: Stan Hoeppner @ 2013-03-27 22:08 UTC (permalink / raw)
  To: Mark Knecht; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On 3/27/2013 4:06 PM, Mark Knecht wrote:

> All that said, I still don't really know if I was starting over today
> how to choose a new chunk size. That still eludes me. I've sort of
> decided that's one of those things that make you guys pros and me just
> a user. :-)

Chunk size is mostly dictated by your workload IO patterns, and the
number and latency of your spindles.  If you're doing mostly small
random IOs, or mixed IOs, metadata heavy workloads, you typically want a
small chunk size, especially if the array is parity (5/6).  RMW is the
performance killer here so you want to minimize--a small chunk size does
this.  If you're doing mostly large streaming writes then you want a
larger chunk size to improve IO efficiency in the elevator and the
drive's write caches, command queuing, etc.  The filesystem you use, and
how it arranges inodes/extents across sectors, can play a role as well.

When in doubt, use a small chunk size.  The reason is this:  a large
chunk can drive small random IO performance into the dirt if you're
using parity or really low RPM low IOPS drives, but a small chunk will
not have anywhere close to the same negative impact on large streaming IO.

A 5 drive RE4 RAID6 array with a 16KB chunk, 48KB stripe, is about as
small as you'd want to go.  It's optimal for small random IO, but it's
probably a bit too small for a mixed workload, and definitely too small
for streaming.  With only 3 slow spindles, a 32KB or even 64KB chunk may
be more optimal, yielding a 96KB or 192KB stripe.  This depends, again,
on your workload(s).  If most of your write IOs are between 48-96KB then
use a 16KB chunk.  If most are between 96-192KB use a 32KB chunk.  If
between 192-384KB then use a 64KB chunk, and so on.

If you're using SSDs the game changes quite a bit as neither random IO
nor RMW latency is an issue.  With SSD, when in doubt, use a large chunk
size, preferably equal to the erase block size, or a power of 2 fraction
of it.

-- 
Stan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 22:08           ` Stan Hoeppner
@ 2013-03-27 22:18             ` Mark Knecht
  2013-03-31 15:56               ` Stan Hoeppner
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Knecht @ 2013-03-27 22:18 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On Wed, Mar 27, 2013 at 3:08 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 3/27/2013 4:06 PM, Mark Knecht wrote:
>
>> All that said, I still don't really know if I was starting over today
>> how to choose a new chunk size. That still eludes me. I've sort of
>> decided that's one of those things that make you guys pros and me just
>> a user. :-)
>
> Chunk size is mostly dictated by your workload IO patterns, and the
> number and latency of your spindles.

Is there a way for me to measure, say over a whole day or some fixed
time, what the workload really looks like?

The machine is a basic Gentoo desktop machine running KDE. The only
workload where I really care about performance is that I run a bunch
of Virtualbox Win 7 & Win XP VMs where I need to the performance to be
as good as I can reasonably get. The problem I have is these VMs are
either 1 huge file (40-50GB in a single file) or many 2GB files. I
haven't a clue how Windows & Virtualbox is accessing what it sees as a
virtual drive and then underlying that how the vbox drivers are using
the system to get to the RAID.

It would be interesting to set some program running, probably on a
weekend or sometime when performance isn't so critical, and see what
sort of data gets collected, assuming there's a program that does that
sort of thing.

Thanks,
Mark

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-27 22:18             ` Mark Knecht
@ 2013-03-31 15:56               ` Stan Hoeppner
  2013-03-31 17:15                 ` Mark Knecht
  0 siblings, 1 reply; 20+ messages in thread
From: Stan Hoeppner @ 2013-03-31 15:56 UTC (permalink / raw)
  To: Mark Knecht; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On 3/27/2013 5:18 PM, Mark Knecht wrote:
> On Wed, Mar 27, 2013 at 3:08 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 3/27/2013 4:06 PM, Mark Knecht wrote:
>>
>>> All that said, I still don't really know if I was starting over today
>>> how to choose a new chunk size. That still eludes me. I've sort of
>>> decided that's one of those things that make you guys pros and me just
>>> a user. :-)
>>
>> Chunk size is mostly dictated by your workload IO patterns, and the
>> number and latency of your spindles.
> 
> Is there a way for me to measure, say over a whole day or some fixed
> time, what the workload really looks like?

That's not the way to go about this.

> The machine is a basic Gentoo desktop machine running KDE. The only
> workload where I really care about performance is that I run a bunch
> of Virtualbox Win 7 & Win XP VMs where I need to the performance to be
> as good as I can reasonably get. The problem I have is these VMs are
> either 1 huge file (40-50GB in a single file) or many 2GB files. I
> haven't a clue how Windows & Virtualbox is accessing what it sees as a
> virtual drive and then underlying that how the vbox drivers are using
> the system to get to the RAID.

So you have a bunch of Windows VM guests that write to large sparse
files residing on what, EXT4?  NTFS block size is 4KB so that's your
smallest IO.

> It would be interesting to set some program running, probably on a
> weekend or sometime when performance isn't so critical, and see what
> sort of data gets collected, assuming there's a program that does that
> sort of thing.

Again, that's not the way to approach this.  What would be informative
to know is what applications you're running in these Windows VMs.  The
application dictates the write pattern.  You don't need a "collector" to
tell you that.  You just need to know the application(s).  If you're
just running productivity apps (web/mail/pdf/etc) inside these VMs then
there's nothing to optimize WRT RAID stripe parameters as you have no
sustained write IO.  So what are the Windows apps?

-- 
Stan



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-31 15:56               ` Stan Hoeppner
@ 2013-03-31 17:15                 ` Mark Knecht
  2013-03-31 17:41                   ` Stan Hoeppner
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Knecht @ 2013-03-31 17:15 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On Sun, Mar 31, 2013 at 8:56 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 3/27/2013 5:18 PM, Mark Knecht wrote:
<SNIP>
>> Is there a way for me to measure, say over a whole day or some fixed
>> time, what the workload really looks like?
>
> That's not the way to go about this.
>
OK

>> The machine is a basic Gentoo desktop machine running KDE. The only
>> workload where I really care about performance is that I run a bunch
>> of Virtualbox Win 7 & Win XP VMs where I need to the performance to be
>> as good as I can reasonably get. The problem I have is these VMs are
>> either 1 huge file (40-50GB in a single file) or many 2GB files. I
>> haven't a clue how Windows & Virtualbox is accessing what it sees as a
>> virtual drive and then underlying that how the vbox drivers are using
>> the system to get to the RAID.
>
> So you have a bunch of Windows VM guests that write to large sparse
> files residing on what, EXT4?  NTFS block size is 4KB so that's your
> smallest IO.
>

Currently EXT3 based on my starting point 2 years ago and never having
changed. I'm open to EXT4 if this discussion show me it warrants the
work. Would rather not deal with anything more exotic right now.

>> It would be interesting to set some program running, probably on a
>> weekend or sometime when performance isn't so critical, and see what
>> sort of data gets collected, assuming there's a program that does that
>> sort of thing.
>
> Again, that's not the way to approach this.  What would be informative
> to know is what applications you're running in these Windows VMs.  The
> application dictates the write pattern.  You don't need a "collector" to
> tell you that.  You just need to know the application(s).  If you're
> just running productivity apps (web/mail/pdf/etc) inside these VMs then
> there's nothing to optimize WRT RAID stripe parameters as you have no
> sustained write IO.  So what are the Windows apps?

Currently 3 VMs, but only 2 matter for performance. The one that
doesn't matter is a VMWare Player VM used for things like watching
Netflix & Hulu. Nothing much more than that. 1 CPU core dedicated. CPU
usage is generally low. I haven't paid much attention to disk usage
for this VM but will check it out.

Performance VMs:

1) This first VM primarily runs TradeStation, a rules-based trading
platform for trading stocks & futures. I generally run with 2-4 CPU
cores and almost never uses much computational power. The big deal in
this VM is stock data caching with years or even decades of data for
each stock or futures contract. Currently this cache appears to be
sitting in a single file which is about 3GB in size. This data streams
into the VM over the net when the markets are open (pretty much 24/7)
and the cache grows. Depending on the type of market and chart the
data might be as fine grained as each individual trade taking place
that day, or it might only be updated once every bar. (1 minute bar, 5
minute bar, daily bar, etc.) TradeStation reads the cache as it needs
data. I have no idea what the access looks like in real time but
generally I expect that it's accessing the data in date order. Whether
the data is sorted or not in this cache file I have no idea.

2) This second VM is more computational in nature. It primarily runs
two apps for long periods of time, although I don't think either app
is all that disk intensive. Noth apps read market data once from disk,
cache it in memory and then computer for hours to days depending on
what I'm asking them to do. I will say I don't see a lot of disk
activity lights when either of these programs are running.

- Adaptrade Builder - a genetic optimization program that attempts to
generate TradeStation EasyLanguage trading strategies. I believe that
once it has the market data in memory it's using memory and disk to
store interesting strategies for me to look at later. The output of
the program is generally a single file ranging in size from 1MB to
maybe 50MB.

- TradingSolutions - a neural network program that attempts to
generate neural network models for trading markets. Each instance of
this program (I typically run 2-3 instances) generally has access to
one file sized 25MB-200MB plus a lot (50-100) small files under 20K in
size. I have no idea how often any of these programs are read or
written. The program runs for hours doing it's work.

I suppose there are other things that happen in the VMs. I run Excel a
lot, but it's not a lot of data.

Hopefully that gives you enough info to suggest a direction.

Thanks,
Mark

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-31 17:15                 ` Mark Knecht
@ 2013-03-31 17:41                   ` Stan Hoeppner
  2013-03-31 17:56                     ` Mark Knecht
  0 siblings, 1 reply; 20+ messages in thread
From: Stan Hoeppner @ 2013-03-31 17:41 UTC (permalink / raw)
  To: Mark Knecht; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On 3/31/2013 12:15 PM, Mark Knecht wrote:
> On Sun, Mar 31, 2013 at 8:56 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 3/27/2013 5:18 PM, Mark Knecht wrote:
> <SNIP>
>>> Is there a way for me to measure, say over a whole day or some fixed
>>> time, what the workload really looks like?
>>
>> That's not the way to go about this.
>>
> OK
> 
>>> The machine is a basic Gentoo desktop machine running KDE. The only
>>> workload where I really care about performance is that I run a bunch
>>> of Virtualbox Win 7 & Win XP VMs where I need to the performance to be
>>> as good as I can reasonably get. The problem I have is these VMs are
>>> either 1 huge file (40-50GB in a single file) or many 2GB files. I
>>> haven't a clue how Windows & Virtualbox is accessing what it sees as a
>>> virtual drive and then underlying that how the vbox drivers are using
>>> the system to get to the RAID.
>>
>> So you have a bunch of Windows VM guests that write to large sparse
>> files residing on what, EXT4?  NTFS block size is 4KB so that's your
>> smallest IO.
>>
> 
> Currently EXT3 based on my starting point 2 years ago and never having
> changed. I'm open to EXT4 if this discussion show me it warrants the
> work. Would rather not deal with anything more exotic right now.

Doesn't make a difference here.

>>> It would be interesting to set some program running, probably on a
>>> weekend or sometime when performance isn't so critical, and see what
>>> sort of data gets collected, assuming there's a program that does that
>>> sort of thing.
>>
>> Again, that's not the way to approach this.  What would be informative
>> to know is what applications you're running in these Windows VMs.  The
>> application dictates the write pattern.  You don't need a "collector" to
>> tell you that.  You just need to know the application(s).  If you're
>> just running productivity apps (web/mail/pdf/etc) inside these VMs then
>> there's nothing to optimize WRT RAID stripe parameters as you have no
>> sustained write IO.  So what are the Windows apps?
> 
> Currently 3 VMs, but only 2 matter for performance. The one that
> doesn't matter is a VMWare Player VM used for things like watching
> Netflix & Hulu. Nothing much more than that. 1 CPU core dedicated. CPU
> usage is generally low. I haven't paid much attention to disk usage
> for this VM but will check it out.
> 
> Performance VMs:
> 
> 1) This first VM primarily runs TradeStation, a rules-based trading
> platform for trading stocks & futures. I generally run with 2-4 CPU
> cores and almost never uses much computational power. The big deal in
> this VM is stock data caching with years or even decades of data for
> each stock or futures contract. Currently this cache appears to be
> sitting in a single file which is about 3GB in size. This data streams
> into the VM over the net when the markets are open (pretty much 24/7)
> and the cache grows. Depending on the type of market and chart the
> data might be as fine grained as each individual trade taking place
> that day, or it might only be updated once every bar. (1 minute bar, 5
> minute bar, daily bar, etc.) TradeStation reads the cache as it needs
> data. I have no idea what the access looks like in real time but
> generally I expect that it's accessing the data in date order. Whether
> the data is sorted or not in this cache file I have no idea.
> 
> 2) This second VM is more computational in nature. It primarily runs
> two apps for long periods of time, although I don't think either app
> is all that disk intensive. Noth apps read market data once from disk,
> cache it in memory and then computer for hours to days depending on
> what I'm asking them to do. I will say I don't see a lot of disk
> activity lights when either of these programs are running.
> 
> - Adaptrade Builder - a genetic optimization program that attempts to
> generate TradeStation EasyLanguage trading strategies. I believe that
> once it has the market data in memory it's using memory and disk to
> store interesting strategies for me to look at later. The output of
> the program is generally a single file ranging in size from 1MB to
> maybe 50MB.
> 
> - TradingSolutions - a neural network program that attempts to
> generate neural network models for trading markets. Each instance of
> this program (I typically run 2-3 instances) generally has access to
> one file sized 25MB-200MB plus a lot (50-100) small files under 20K in
> size. I have no idea how often any of these programs are read or
> written. The program runs for hours doing it's work.
> 
> I suppose there are other things that happen in the VMs. I run Excel a
> lot, but it's not a lot of data.
> 
> Hopefully that gives you enough info to suggest a direction.

These applications append small data slowly over a long period of time,
which usually means fragmentation.  Thus there's not much to optimize at
the chunk/stripe level, other than keeping chunk size small to spread
random reads over all platters.  You currently have a 16KB chunk, IIRC,
which is about as good as you'll get for this workload.  Given your
applications' low write throughput chunk/strip really doesn't matter.

-- 
Stan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-31 17:41                   ` Stan Hoeppner
@ 2013-03-31 17:56                     ` Mark Knecht
  2013-04-01  0:28                       ` Stan Hoeppner
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Knecht @ 2013-03-31 17:56 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On Sun, Mar 31, 2013 at 10:41 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 3/31/2013 12:15 PM, Mark Knecht wrote:
<SNIP>
>>
>> Hopefully that gives you enough info to suggest a direction.
>
> These applications append small data slowly over a long period of time,
> which usually means fragmentation.  Thus there's not much to optimize at
> the chunk/stripe level, other than keeping chunk size small to spread
> random reads over all platters.  You currently have a 16KB chunk, IIRC,
> which is about as good as you'll get for this workload.  Given your
> applications' low write throughput chunk/strip really doesn't matter.
>
> --
> Stan
>

OK, I cannot argue with your conclusions and will stick with 16K for now.

Presumably if any improvement is to be made here its getting
everything onto a single partition instead of multiple RAIDs on the
same drives which then reduces the physical overhead (moving heads to
different partitions) and allows the md software to do the heavy
lifting?

Thanks,
Mark

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-03-31 17:56                     ` Mark Knecht
@ 2013-04-01  0:28                       ` Stan Hoeppner
  2013-04-01 16:46                         ` Mark Knecht
  0 siblings, 1 reply; 20+ messages in thread
From: Stan Hoeppner @ 2013-04-01  0:28 UTC (permalink / raw)
  To: Mark Knecht; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On 3/31/2013 12:56 PM, Mark Knecht wrote:
> On Sun, Mar 31, 2013 at 10:41 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> On 3/31/2013 12:15 PM, Mark Knecht wrote:
> <SNIP>
>>>
>>> Hopefully that gives you enough info to suggest a direction.
>>
>> These applications append small data slowly over a long period of time,
>> which usually means fragmentation.  Thus there's not much to optimize at
>> the chunk/stripe level, other than keeping chunk size small to spread
>> random reads over all platters.  You currently have a 16KB chunk, IIRC,
>> which is about as good as you'll get for this workload.  Given your
>> applications' low write throughput chunk/strip really doesn't matter.
>>
>> --
>> Stan
>>
> 
> OK, I cannot argue with your conclusions and will stick with 16K for now.
> 
> Presumably if any improvement is to be made here its getting
> everything onto a single partition instead of multiple RAIDs on the
> same drives which then reduces the physical overhead (moving heads to
> different partitions) and allows the md software to do the heavy
> lifting?

Your write IO rate appears to be so low that it really makes no
difference.  I'd guess you could run all of this from a single fast disk
drive (10/15K or SSD) without skipping a beat.

-- 
Stan



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-04-01  0:28                       ` Stan Hoeppner
@ 2013-04-01 16:46                         ` Mark Knecht
  2013-04-02  1:15                           ` Brad Campbell
  0 siblings, 1 reply; 20+ messages in thread
From: Mark Knecht @ 2013-04-01 16:46 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On Sun, Mar 31, 2013 at 5:28 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
> On 3/31/2013 12:56 PM, Mark Knecht wrote:
>> On Sun, Mar 31, 2013 at 10:41 AM, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>>> On 3/31/2013 12:15 PM, Mark Knecht wrote:
>> <SNIP>
>>>>
>>>> Hopefully that gives you enough info to suggest a direction.
>>>
>>> These applications append small data slowly over a long period of time,
>>> which usually means fragmentation.  Thus there's not much to optimize at
>>> the chunk/stripe level, other than keeping chunk size small to spread
>>> random reads over all platters.  You currently have a 16KB chunk, IIRC,
>>> which is about as good as you'll get for this workload.  Given your
>>> applications' low write throughput chunk/strip really doesn't matter.
>>>
>>> --
>>> Stan
>>>
>>
>> OK, I cannot argue with your conclusions and will stick with 16K for now.
>>
>> Presumably if any improvement is to be made here its getting
>> everything onto a single partition instead of multiple RAIDs on the
>> same drives which then reduces the physical overhead (moving heads to
>> different partitions) and allows the md software to do the heavy
>> lifting?
>
> Your write IO rate appears to be so low that it really makes no
> difference.  I'd guess you could run all of this from a single fast disk
> drive (10/15K or SSD) without skipping a beat.
>
> --
> Stan
>
>

So maybe the idea I had awhile back about moving the VMs to the SSD -
the VMs are about 90GB, the SSD is 128GB - and then at the end of
every day just copying the VMs over to the RAID as a backup - would be
a better way to run?

There was a thread here sometime ago about using an SSD as a cache for
RAID. I suppose that's a possibility but it sounds like more
complexity than I need or want.

Thanks,
Mark

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible to change chunk size on RAID-1 without re-init or destructive result?
  2013-04-01 16:46                         ` Mark Knecht
@ 2013-04-02  1:15                           ` Brad Campbell
  0 siblings, 0 replies; 20+ messages in thread
From: Brad Campbell @ 2013-04-02  1:15 UTC (permalink / raw)
  To: Mark Knecht; +Cc: Stan Hoeppner, Roy Sigurd Karlsbakk, Jeff Johnson, Linux-RAID

On 02/04/13 00:46, Mark Knecht wrote:
> On Sun, Mar 31, 2013 at 5:28 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

> So maybe the idea I had awhile back about moving the VMs to the SSD -
> the VMs are about 90GB, the SSD is 128GB - and then at the end of
> every day just copying the VMs over to the RAID as a backup - would be
> a better way to run?

I do this using rsync to give me 5 days of rotating VM backup. I don't 
use snapshotting or any other data-consistency measures, and I routinely 
restore the backups onto a test machine and have (probably luckily) not 
encountered any consistency issues a quick fsck won't fix up.

As a double backup, I also use either Windows backup or rsync inside the 
virtual guests to perform full system backups.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-04-02  1:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-27  5:30 Possible to change chunk size on RAID-1 without re-init or destructive result? Jeff Johnson
2013-03-27  5:56 ` Mikael Abrahamsson
2013-03-27  6:02 ` Roman Mamedov
2013-03-27 16:01 ` Roy Sigurd Karlsbakk
2013-03-27 16:23   ` Jeff Johnson
2013-03-27 16:44     ` Roman Mamedov
2013-03-27 19:36     ` Stan Hoeppner
2013-03-27 19:11   ` Stan Hoeppner
2013-03-27 19:23     ` Mark Knecht
2013-03-27 20:10       ` Stan Hoeppner
2013-03-27 21:06         ` Mark Knecht
2013-03-27 22:08           ` Stan Hoeppner
2013-03-27 22:18             ` Mark Knecht
2013-03-31 15:56               ` Stan Hoeppner
2013-03-31 17:15                 ` Mark Knecht
2013-03-31 17:41                   ` Stan Hoeppner
2013-03-31 17:56                     ` Mark Knecht
2013-04-01  0:28                       ` Stan Hoeppner
2013-04-01 16:46                         ` Mark Knecht
2013-04-02  1:15                           ` Brad Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.