All of lore.kernel.org
 help / color / mirror / Atom feed
* mdadm does not create partition devices whatsoever, "partitionable" functionality broken
@ 2011-05-13 15:13 Christopher White
  2011-05-13 16:49 ` Phil Turmel
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher White @ 2011-05-13 15:13 UTC (permalink / raw)
  To: linux-raid

Greetings.

I have spent TEN hours trying everything other than regressing to a 
REALLY old version. I started out on 3.1.4 and have also tried manually 
upgrading to 3.2.1, but the bug still exists.

Somewhere along the way, the "auto" partitionable flag has broken.

sudo mdadm --create --level=raid5 --auto=part2 /dev/md1 --metadata=1.2 
--raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2

This only creates /dev/md1. It is of course possible to create one big 
partition as /dev/md1p1 with any partitioning program, but FORGET about 
trying to create /dev/md1p2.

The problem is that the RAID array is NOT created in partitionable mode, 
and only supports one large partition, despite ALL attempts at EVERY 
format of the --auto option, you name it, -a part2, --auto=mdp2, 
--auto=part2, --auto=p2, --auto=mdp, --auto=part, --auto=p, --auto=p4, 
you name it and I've tried it!

My guess is the functionality of creating partitionable arrays literally 
DID break somewhere prior to/at version 3.1.4 which is the earliest 
version I tried.

I'm giving up and creating physical n-1 sized partitions on the source 
disks and creating two RAID 5 arrays from those partitions instead, but 
decided I really MUST report this bug so that other people don't bang 
their head against the wall for ten hours of their life as well. ;-)


Christopher


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 15:13 mdadm does not create partition devices whatsoever, "partitionable" functionality broken Christopher White
@ 2011-05-13 16:49 ` Phil Turmel
  2011-05-13 17:18   ` Christopher White
  0 siblings, 1 reply; 21+ messages in thread
From: Phil Turmel @ 2011-05-13 16:49 UTC (permalink / raw)
  To: Christopher White; +Cc: linux-raid

Hi Christopher,

On 05/13/2011 11:13 AM, Christopher White wrote:
> Greetings.
> 
> I have spent TEN hours trying everything other than regressing to a REALLY old version. I started out on 3.1.4 and have also tried manually upgrading to 3.2.1, but the bug still exists.
> 
> Somewhere along the way, the "auto" partitionable flag has broken.
> 
> sudo mdadm --create --level=raid5 --auto=part2 /dev/md1 --metadata=1.2 --raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
> 
> This only creates /dev/md1. It is of course possible to create one big partition as /dev/md1p1 with any partitioning program, but FORGET about trying to create /dev/md1p2.

What exactly did fdisk or parted report when you tried to partition /dev/md1 ?

> The problem is that the RAID array is NOT created in partitionable mode, and only supports one large partition, despite ALL attempts at EVERY format of the --auto option, you name it, -a part2, --auto=mdp2, --auto=part2, --auto=p2, --auto=mdp, --auto=part, --auto=p, --auto=p4, you name it and I've tried it!
> 
> My guess is the functionality of creating partitionable arrays literally DID break somewhere prior to/at version 3.1.4 which is the earliest version I tried.

The mdadm <==> kernel interface for this might be broken, but as a side-effect of the change to make all md devices support conventional partition tables.  I don't recall exactly when this changed, but it was several kernels ago.

What kernel are you running?

> I'm giving up and creating physical n-1 sized partitions on the source disks and creating two RAID 5 arrays from those partitions instead, but decided I really MUST report this bug so that other people don't bang their head against the wall for ten hours of their life as well. ;-)

Consider trying "mdadm --create" without the "--auto" option at all, then fdisk on the resulting array.

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 16:49 ` Phil Turmel
@ 2011-05-13 17:18   ` Christopher White
  2011-05-13 17:32     ` Christopher White
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher White @ 2011-05-13 17:18 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

Hi Phil, thanks for the response!

On 5/13/11 6:49 PM, Phil Turmel wrote:
> Hi Christopher,
>
> On 05/13/2011 11:13 AM, Christopher White wrote:
>> Greetings.
>>
>> I have spent TEN hours trying everything other than regressing to a REALLY old version. I started out on 3.1.4 and have also tried manually upgrading to 3.2.1, but the bug still exists.
>>
>> Somewhere along the way, the "auto" partitionable flag has broken.
>>
>> sudo mdadm --create --level=raid5 --auto=part2 /dev/md1 --metadata=1.2 --raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
>>
>> This only creates /dev/md1. It is of course possible to create one big partition as /dev/md1p1 with any partitioning program, but FORGET about trying to create /dev/md1p2.
> What exactly did fdisk or parted report when you tried to partition /dev/md1 ?
I run "sudo gparted /dev/md1" to access the whole RAID array, since I 
like the GUI for precisely creating partitions. When making two ext4 
partitions and applying the changes, it successfully creates /dev/md1p1 
(which does not exist before this operation is performed). It then goes 
on to trying to create md1p2 and it sends the commands to the md1 
device, but md1p2 is never created. After the step of creating the 
partition (which failed, but gparted does not know that), it tries to 
set up the file system, which fails since there is no md1p2:
mkfs.ext4 -j -O extent -L "" /dev/md1p2
"mke2fs 1.41.14 (22-Dec-2010)
Could not stat /dev/md1p2 --- No such file or directory"
>> The problem is that the RAID array is NOT created in partitionable mode, and only supports one large partition, despite ALL attempts at EVERY format of the --auto option, you name it, -a part2, --auto=mdp2, --auto=part2, --auto=p2, --auto=mdp, --auto=part, --auto=p, --auto=p4, you name it and I've tried it!
>>
>> My guess is the functionality of creating partitionable arrays literally DID break somewhere prior to/at version 3.1.4 which is the earliest version I tried.
> The mdadm<==>  kernel interface for this might be broken, but as a side-effect of the change to make all md devices support conventional partition tables.  I don't recall exactly when this changed, but it was several kernels ago.
>
> What kernel are you running?
Linux Mint 11 RC, which uses 2.6.38-8-generic.
>> I'm giving up and creating physical n-1 sized partitions on the source disks and creating two RAID 5 arrays from those partitions instead, but decided I really MUST report this bug so that other people don't bang their head against the wall for ten hours of their life as well. ;-)
> Consider trying "mdadm --create" without the "--auto" option at all, then fdisk on the resulting array.
>
> Phil
I've tried that as well during my testing since some postings suggested 
that leaving out the option will create a partitionable array, but it 
didn't.


Christopher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 17:18   ` Christopher White
@ 2011-05-13 17:32     ` Christopher White
  2011-05-13 17:40       ` Roman Mamedov
  2011-05-13 17:43       ` Phil Turmel
  0 siblings, 2 replies; 21+ messages in thread
From: Christopher White @ 2011-05-13 17:32 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

I forgot to mention that I've also tried "sudo fdisk /dev/md1" and 
creating two partitions that way. It fails too.

This leads me to conclude that /dev/md1 was never created in 
partitionable mode and that the kernel refuses to create anything beyond 
a single partition on it.

On 5/13/11 7:18 PM, Christopher White wrote:
> Hi Phil, thanks for the response!
>
> On 5/13/11 6:49 PM, Phil Turmel wrote:
>> Hi Christopher,
>>
>> On 05/13/2011 11:13 AM, Christopher White wrote:
>>> Greetings.
>>>
>>> I have spent TEN hours trying everything other than regressing to a 
>>> REALLY old version. I started out on 3.1.4 and have also tried 
>>> manually upgrading to 3.2.1, but the bug still exists.
>>>
>>> Somewhere along the way, the "auto" partitionable flag has broken.
>>>
>>> sudo mdadm --create --level=raid5 --auto=part2 /dev/md1 
>>> --metadata=1.2 --raid-devices=4 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
>>>
>>> This only creates /dev/md1. It is of course possible to create one 
>>> big partition as /dev/md1p1 with any partitioning program, but 
>>> FORGET about trying to create /dev/md1p2.
>> What exactly did fdisk or parted report when you tried to partition 
>> /dev/md1 ?
> I run "sudo gparted /dev/md1" to access the whole RAID array, since I 
> like the GUI for precisely creating partitions. When making two ext4 
> partitions and applying the changes, it successfully creates 
> /dev/md1p1 (which does not exist before this operation is performed). 
> It then goes on to trying to create md1p2 and it sends the commands to 
> the md1 device, but md1p2 is never created. After the step of creating 
> the partition (which failed, but gparted does not know that), it tries 
> to set up the file system, which fails since there is no md1p2:
> mkfs.ext4 -j -O extent -L "" /dev/md1p2
> "mke2fs 1.41.14 (22-Dec-2010)
> Could not stat /dev/md1p2 --- No such file or directory"
>>> The problem is that the RAID array is NOT created in partitionable 
>>> mode, and only supports one large partition, despite ALL attempts at 
>>> EVERY format of the --auto option, you name it, -a part2, 
>>> --auto=mdp2, --auto=part2, --auto=p2, --auto=mdp, --auto=part, 
>>> --auto=p, --auto=p4, you name it and I've tried it!
>>>
>>> My guess is the functionality of creating partitionable arrays 
>>> literally DID break somewhere prior to/at version 3.1.4 which is the 
>>> earliest version I tried.
>> The mdadm<==>  kernel interface for this might be broken, but as a 
>> side-effect of the change to make all md devices support conventional 
>> partition tables.  I don't recall exactly when this changed, but it 
>> was several kernels ago.
>>
>> What kernel are you running?
> Linux Mint 11 RC, which uses 2.6.38-8-generic.
>>> I'm giving up and creating physical n-1 sized partitions on the 
>>> source disks and creating two RAID 5 arrays from those partitions 
>>> instead, but decided I really MUST report this bug so that other 
>>> people don't bang their head against the wall for ten hours of their 
>>> life as well. ;-)
>> Consider trying "mdadm --create" without the "--auto" option at all, 
>> then fdisk on the resulting array.
>>
>> Phil
> I've tried that as well during my testing since some postings 
> suggested that leaving out the option will create a partitionable 
> array, but it didn't.
>
>
> Christopher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 17:32     ` Christopher White
@ 2011-05-13 17:40       ` Roman Mamedov
  2011-05-13 18:04         ` Christopher White
  2011-05-13 17:43       ` Phil Turmel
  1 sibling, 1 reply; 21+ messages in thread
From: Roman Mamedov @ 2011-05-13 17:40 UTC (permalink / raw)
  To: Christopher White; +Cc: Phil Turmel, linux-raid

[-- Attachment #1: Type: text/plain, Size: 468 bytes --]

On Fri, 13 May 2011 19:32:23 +0200
Christopher White <linux@pulseforce.com> wrote:

> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and 
> creating two partitions that way. It fails too.
> 
> This leads me to conclude that /dev/md1 was never created in 
> partitionable mode and that the kernel refuses to create anything beyond 
> a single partition on it.

Did you try running "blockdev --rereadpt /dev/md1"?

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 17:32     ` Christopher White
  2011-05-13 17:40       ` Roman Mamedov
@ 2011-05-13 17:43       ` Phil Turmel
  1 sibling, 0 replies; 21+ messages in thread
From: Phil Turmel @ 2011-05-13 17:43 UTC (permalink / raw)
  To: Christopher White; +Cc: linux-raid

On 05/13/2011 01:32 PM, Christopher White wrote:
> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and creating two partitions that way. It fails too.

Please show "partx --show /dev/md1" after the fdisk operation above.

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 17:40       ` Roman Mamedov
@ 2011-05-13 18:04         ` Christopher White
  2011-05-13 18:18           ` Phil Turmel
  0 siblings, 1 reply; 21+ messages in thread
From: Christopher White @ 2011-05-13 18:04 UTC (permalink / raw)
  To: Roman Mamedov, Phil Turmel; +Cc: linux-raid



On 5/13/11 7:40 PM, Roman Mamedov wrote:
> On Fri, 13 May 2011 19:32:23 +0200
> Christopher White<linux@pulseforce.com>  wrote:
>
>> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
>> creating two partitions that way. It fails too.
>>
>> This leads me to conclude that /dev/md1 was never created in
>> partitionable mode and that the kernel refuses to create anything beyond
>> a single partition on it.
> Did you try running "blockdev --rereadpt /dev/md1"?
>
Hmm. Hmmmm. One more for good measure: Hmmmmmmm.

That's weird! Here's the thing: Fdisk is *just* for creating the 
partitions, not formatting them, so for that one it makes sense that you 
must re-read the partition table before you have a partition device to 
execute "mkfs.XXX" on.

However, Gparted on the other hand is BOTH for creating partition tables 
AND for executing the "make filesystem" commands (formatting). 
Therefore, Gparted is supposed to tell the kernel about partition table 
changes BEFORE trying to access the partitions it just created. 
Basically, Gparted goes: Blank disk, create partition table, create 
partitions, notify OS to re-scan the table, THEN access the new 
partition devices and format them. But instead, it skips the "notify OS" 
part when working with md-arrays!

When you use Gparted on PHYSICAL hard disks, it properly creates the 
partition table and the OS is updated to immediately see the new 
partition devices, to allow them to be formatted.

Therefore, what this has shown is that the necessary procedure in 
Gparted is:
* sudo gparted /dev/md1
* Create the partition table (gpt for instance)
* Create as many partitions as you need BUT SET THEIR TYPE TO 
"unformatted" (extremely important).
* Go back to a terminal and execute "sudo blockdev --rereadpt /dev/md1" 
to let the kernel see the new partition devices
* Now go back to the Gparted and format the partitions, or just do it 
the CLI way with mkfs.ext4 manually. Either way, it will now work.

So how should we sum up this problem? Well, that depends. What is 
responsible for auto-discovering the new partitions when you use Gparted 
on a PHYSICAL disk (which works perfectly without manual re-scan 
commands)? 1) Is it Gparted telling the kernel to re-scan, or 2) is it 
the kernel that auto-watches physical disks for changes?

If 1), it means Gparted needs a bug fix to tell the kernel to re-scan 
the partition table for md-arrays when you re-partition them.
If 2), it means the kernel doesn't watch md-arrays for partition table 
changes, which debatably it should be doing.


Thoughts?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 18:04         ` Christopher White
@ 2011-05-13 18:18           ` Phil Turmel
  2011-05-13 18:54             ` Christopher White
  0 siblings, 1 reply; 21+ messages in thread
From: Phil Turmel @ 2011-05-13 18:18 UTC (permalink / raw)
  To: Christopher White; +Cc: Roman Mamedov, linux-raid

Hi Christopher,

On 05/13/2011 02:04 PM, Christopher White wrote:
> On 5/13/11 7:40 PM, Roman Mamedov wrote:
>> On Fri, 13 May 2011 19:32:23 +0200
>> Christopher White<linux@pulseforce.com>  wrote:
>>
>>> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
>>> creating two partitions that way. It fails too.
>>>
>>> This leads me to conclude that /dev/md1 was never created in
>>> partitionable mode and that the kernel refuses to create anything beyond
>>> a single partition on it.
>> Did you try running "blockdev --rereadpt /dev/md1"?
>>
> Hmm. Hmmmm. One more for good measure: Hmmmmmmm.
> 
> That's weird! Here's the thing: Fdisk is *just* for creating the partitions, not formatting them, so for that one it makes sense that you must re-read the partition table before you have a partition device to execute "mkfs.XXX" on.
> 
> However, Gparted on the other hand is BOTH for creating partition tables AND for executing the "make filesystem" commands (formatting). Therefore, Gparted is supposed to tell the kernel about partition table changes BEFORE trying to access the partitions it just created. Basically, Gparted goes: Blank disk, create partition table, create partitions, notify OS to re-scan the table, THEN access the new partition devices and format them. But instead, it skips the "notify OS" part when working with md-arrays!
> 
> When you use Gparted on PHYSICAL hard disks, it properly creates the partition table and the OS is updated to immediately see the new partition devices, to allow them to be formatted.

Indeed.  I suspect (g)parted is in fact requesting a rescan, but is being ignored.

I just tried this on one of my servers, and parted (v2.3) choked on an assertion.  Hmm.

> Therefore, what this has shown is that the necessary procedure in Gparted is:
> * sudo gparted /dev/md1
> * Create the partition table (gpt for instance)
> * Create as many partitions as you need BUT SET THEIR TYPE TO "unformatted" (extremely important).
> * Go back to a terminal and execute "sudo blockdev --rereadpt /dev/md1" to let the kernel see the new partition devices
> * Now go back to the Gparted and format the partitions, or just do it the CLI way with mkfs.ext4 manually. Either way, it will now work.
> 
> So how should we sum up this problem? Well, that depends. What is responsible for auto-discovering the new partitions when you use Gparted on a PHYSICAL disk (which works perfectly without manual re-scan commands)? 1) Is it Gparted telling the kernel to re-scan, or 2) is it the kernel that auto-watches physical disks for changes?

Generally, udev does it.  But based on my little test, I suspect parted is at fault.  fdisk did just fine.

> If 1), it means Gparted needs a bug fix to tell the kernel to re-scan the partition table for md-arrays when you re-partition them.
> If 2), it means the kernel doesn't watch md-arrays for partition table changes, which debatably it should be doing.

What is ignored or acted upon is decided by udev rules, as far as I know.  You might want to monitor udev events while running some of your tests (physical disk vs. MD).

> Thoughts?

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 18:18           ` Phil Turmel
@ 2011-05-13 18:54             ` Christopher White
  2011-05-13 19:01               ` Rudy Zijlstra
  2011-05-13 19:22               ` Phil Turmel
  0 siblings, 2 replies; 21+ messages in thread
From: Christopher White @ 2011-05-13 18:54 UTC (permalink / raw)
  To: Phil Turmel, Roman Mamedov; +Cc: linux-raid

Hello again Phil (and Roman). Thanks to your back-and-forth, the bug has 
now finally been completely narrowed down: It is a bug in (g)parted!

The issue is that (g)parted doesn't properly call the kernel API for 
re-scanning the device when you operate on md disks as compared to 
physical disks.

Your information (Phil) that (g)parted chokes on an assertion is good 
information for when I report this bug. It's not impossible that you 
must handle md-disks differently from physical disks and that (g)parted 
is not aware of that distinction, therefore choking on the partition 
table rescan API.

Either way, this is fantastic news, because it means it's not an md 
kernel bug, where waiting for a fix would have severely pushed back my 
current project. I'm glad it was simply (g)parted failing to tell the 
kernel to re-read the partition tables.

---

With this bug out of the way (I'll be reporting it to parted's mailing 
list now),one thing that's been bugging me during my hours of research 
is that the vast majority of users use either a single, large RAID array 
and virtually partition that with LVM, or alternatively breaking each 
disk into many small partitions and making multiple smaller arrays out 
of those partitions. Very few people seem to use md's built-in support 
for partitionable raid arrays.

This makes me a tiny bit wary to trust the stability of md's 
partitionable implementation, even though I suspect it is rock solid. I 
suspect the reason that most people don't use the feature is for 
legacy/habit reasons, since md used to support only a single partition, 
so there's avast amount of guides telling people to use LVM. Do any of 
you know anything about this and can advise on whether I should go for a 
single-partition MD array with LVM, or a partitionable MD array?

As far as performance goes, the CPU overhead of LVM is in the 1-5% range 
from what I've heard, and I have zero need for the other features LVM 
provides (snapshots, backups, online resizing, clusters of disks acting 
as one disk, etc), so it just feels completely overkill and worthless 
when all I need is a single, partitionable RAID array.

All I need is the ability to (in the future) add more disks to the 
array, grow the array, and then resize+move the partitions around using 
regular partitioning tools treating the RAID array as a single disk, and 
md's partitionable arrays support doing this since they act as a disk, 
where if you add more hard disks to your array; the available, 
unallocated space on that array simply grows and partitions on it can be 
expanded and relocated to take advantage of this. I don't need LVM for 
any of that, as long as md's implementation is stable.


Christopher

On 5/13/11 8:18 PM, Phil Turmel wrote:
> Hi Christopher,
>
> On 05/13/2011 02:04 PM, Christopher White wrote:
>> On 5/13/11 7:40 PM, Roman Mamedov wrote:
>>> On Fri, 13 May 2011 19:32:23 +0200
>>> Christopher White<linux@pulseforce.com>   wrote:
>>>
>>>> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
>>>> creating two partitions that way. It fails too.
>>>>
>>>> This leads me to conclude that /dev/md1 was never created in
>>>> partitionable mode and that the kernel refuses to create anything beyond
>>>> a single partition on it.
>>> Did you try running "blockdev --rereadpt /dev/md1"?
>>>
>> Hmm. Hmmmm. One more for good measure: Hmmmmmmm.
>>
>> That's weird! Here's the thing: Fdisk is *just* for creating the partitions, not formatting them, so for that one it makes sense that you must re-read the partition table before you have a partition device to execute "mkfs.XXX" on.
>>
>> However, Gparted on the other hand is BOTH for creating partition tables AND for executing the "make filesystem" commands (formatting). Therefore, Gparted is supposed to tell the kernel about partition table changes BEFORE trying to access the partitions it just created. Basically, Gparted goes: Blank disk, create partition table, create partitions, notify OS to re-scan the table, THEN access the new partition devices and format them. But instead, it skips the "notify OS" part when working with md-arrays!
>>
>> When you use Gparted on PHYSICAL hard disks, it properly creates the partition table and the OS is updated to immediately see the new partition devices, to allow them to be formatted.
> Indeed.  I suspect (g)parted is in fact requesting a rescan, but is being ignored.
>
> I just tried this on one of my servers, and parted (v2.3) choked on an assertion.  Hmm.
>
>> Therefore, what this has shown is that the necessary procedure in Gparted is:
>> * sudo gparted /dev/md1
>> * Create the partition table (gpt for instance)
>> * Create as many partitions as you need BUT SET THEIR TYPE TO "unformatted" (extremely important).
>> * Go back to a terminal and execute "sudo blockdev --rereadpt /dev/md1" to let the kernel see the new partition devices
>> * Now go back to the Gparted and format the partitions, or just do it the CLI way with mkfs.ext4 manually. Either way, it will now work.
>>
>> So how should we sum up this problem? Well, that depends. What is responsible for auto-discovering the new partitions when you use Gparted on a PHYSICAL disk (which works perfectly without manual re-scan commands)? 1) Is it Gparted telling the kernel to re-scan, or 2) is it the kernel that auto-watches physical disks for changes?
> Generally, udev does it.  But based on my little test, I suspect parted is at fault.  fdisk did just fine.
>
>> If 1), it means Gparted needs a bug fix to tell the kernel to re-scan the partition table for md-arrays when you re-partition them.
>> If 2), it means the kernel doesn't watch md-arrays for partition table changes, which debatably it should be doing.
> What is ignored or acted upon is decided by udev rules, as far as I know.  You might want to monitor udev events while running some of your tests (physical disk vs. MD).
>
>> Thoughts?
> Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 18:54             ` Christopher White
@ 2011-05-13 19:01               ` Rudy Zijlstra
  2011-05-13 19:49                 ` Christopher White
  2011-05-13 19:49                 ` Christopher White
  2011-05-13 19:22               ` Phil Turmel
  1 sibling, 2 replies; 21+ messages in thread
From: Rudy Zijlstra @ 2011-05-13 19:01 UTC (permalink / raw)
  To: Christopher White; +Cc: Phil Turmel, Roman Mamedov, linux-raid

Hi Chris,


I've run paritioned MD disks for several years now. I do that on systems 
where i use md for the system partitions. One mirror with partitions for 
the different system aspects. I prefer that, as it reflects best the 
actual physical configuration, and all partitions will be degraded at 
the same time when 1 disk develops a problem (which is unfortunately not 
the case when you partition the disk and then mirror the partitions).

As i am a bit lazy and have only limited wish to fight with 
BIOS/bootloader conflicts / vagaries, these systems typically boot from 
the network (kernel gets loaded from the network, from there onwards all 
is on the local disk).

Cheers,



Rudy



On 05/13/2011 08:54 PM, Christopher White wrote:
> Hello again Phil (and Roman). Thanks to your back-and-forth, the bug 
> has now finally been completely narrowed down: It is a bug in (g)parted!
>
> The issue is that (g)parted doesn't properly call the kernel API for 
> re-scanning the device when you operate on md disks as compared to 
> physical disks.
>
> Your information (Phil) that (g)parted chokes on an assertion is good 
> information for when I report this bug. It's not impossible that you 
> must handle md-disks differently from physical disks and that 
> (g)parted is not aware of that distinction, therefore choking on the 
> partition table rescan API.
>
> Either way, this is fantastic news, because it means it's not an md 
> kernel bug, where waiting for a fix would have severely pushed back my 
> current project. I'm glad it was simply (g)parted failing to tell the 
> kernel to re-read the partition tables.
>
> ---
>
> With this bug out of the way (I'll be reporting it to parted's mailing 
> list now),one thing that's been bugging me during my hours of research 
> is that the vast majority of users use either a single, large RAID 
> array and virtually partition that with LVM, or alternatively breaking 
> each disk into many small partitions and making multiple smaller 
> arrays out of those partitions. Very few people seem to use md's 
> built-in support for partitionable raid arrays.
>
> This makes me a tiny bit wary to trust the stability of md's 
> partitionable implementation, even though I suspect it is rock solid. 
> I suspect the reason that most people don't use the feature is for 
> legacy/habit reasons, since md used to support only a single 
> partition, so there's avast amount of guides telling people to use 
> LVM. Do any of you know anything about this and can advise on whether 
> I should go for a single-partition MD array with LVM, or a 
> partitionable MD array?
>
> As far as performance goes, the CPU overhead of LVM is in the 1-5% 
> range from what I've heard, and I have zero need for the other 
> features LVM provides (snapshots, backups, online resizing, clusters 
> of disks acting as one disk, etc), so it just feels completely 
> overkill and worthless when all I need is a single, partitionable RAID 
> array.
>
> All I need is the ability to (in the future) add more disks to the 
> array, grow the array, and then resize+move the partitions around 
> using regular partitioning tools treating the RAID array as a single 
> disk, and md's partitionable arrays support doing this since they act 
> as a disk, where if you add more hard disks to your array; the 
> available, unallocated space on that array simply grows and partitions 
> on it can be expanded and relocated to take advantage of this. I don't 
> need LVM for any of that, as long as md's implementation is stable.
>
>
> Christopher
>
> On 5/13/11 8:18 PM, Phil Turmel wrote:
>> Hi Christopher,
>>
>> On 05/13/2011 02:04 PM, Christopher White wrote:
>>> On 5/13/11 7:40 PM, Roman Mamedov wrote:
>>>> On Fri, 13 May 2011 19:32:23 +0200
>>>> Christopher White<linux@pulseforce.com>   wrote:
>>>>
>>>>> I forgot to mention that I've also tried "sudo fdisk /dev/md1" and
>>>>> creating two partitions that way. It fails too.
>>>>>
>>>>> This leads me to conclude that /dev/md1 was never created in
>>>>> partitionable mode and that the kernel refuses to create anything 
>>>>> beyond
>>>>> a single partition on it.
>>>> Did you try running "blockdev --rereadpt /dev/md1"?
>>>>
>>> Hmm. Hmmmm. One more for good measure: Hmmmmmmm.
>>>
>>> That's weird! Here's the thing: Fdisk is *just* for creating the 
>>> partitions, not formatting them, so for that one it makes sense that 
>>> you must re-read the partition table before you have a partition 
>>> device to execute "mkfs.XXX" on.
>>>
>>> However, Gparted on the other hand is BOTH for creating partition 
>>> tables AND for executing the "make filesystem" commands 
>>> (formatting). Therefore, Gparted is supposed to tell the kernel 
>>> about partition table changes BEFORE trying to access the partitions 
>>> it just created. Basically, Gparted goes: Blank disk, create 
>>> partition table, create partitions, notify OS to re-scan the table, 
>>> THEN access the new partition devices and format them. But instead, 
>>> it skips the "notify OS" part when working with md-arrays!
>>>
>>> When you use Gparted on PHYSICAL hard disks, it properly creates the 
>>> partition table and the OS is updated to immediately see the new 
>>> partition devices, to allow them to be formatted.
>> Indeed.  I suspect (g)parted is in fact requesting a rescan, but is 
>> being ignored.
>>
>> I just tried this on one of my servers, and parted (v2.3) choked on 
>> an assertion.  Hmm.
>>
>>> Therefore, what this has shown is that the necessary procedure in 
>>> Gparted is:
>>> * sudo gparted /dev/md1
>>> * Create the partition table (gpt for instance)
>>> * Create as many partitions as you need BUT SET THEIR TYPE TO 
>>> "unformatted" (extremely important).
>>> * Go back to a terminal and execute "sudo blockdev --rereadpt 
>>> /dev/md1" to let the kernel see the new partition devices
>>> * Now go back to the Gparted and format the partitions, or just do 
>>> it the CLI way with mkfs.ext4 manually. Either way, it will now work.
>>>
>>> So how should we sum up this problem? Well, that depends. What is 
>>> responsible for auto-discovering the new partitions when you use 
>>> Gparted on a PHYSICAL disk (which works perfectly without manual 
>>> re-scan commands)? 1) Is it Gparted telling the kernel to re-scan, 
>>> or 2) is it the kernel that auto-watches physical disks for changes?
>> Generally, udev does it.  But based on my little test, I suspect 
>> parted is at fault.  fdisk did just fine.
>>
>>> If 1), it means Gparted needs a bug fix to tell the kernel to 
>>> re-scan the partition table for md-arrays when you re-partition them.
>>> If 2), it means the kernel doesn't watch md-arrays for partition 
>>> table changes, which debatably it should be doing.
>> What is ignored or acted upon is decided by udev rules, as far as I 
>> know.  You might want to monitor udev events while running some of 
>> your tests (physical disk vs. MD).
>>
>>> Thoughts?
>> Phil
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 18:54             ` Christopher White
  2011-05-13 19:01               ` Rudy Zijlstra
@ 2011-05-13 19:22               ` Phil Turmel
  2011-05-13 19:32                 ` Roman Mamedov
  1 sibling, 1 reply; 21+ messages in thread
From: Phil Turmel @ 2011-05-13 19:22 UTC (permalink / raw)
  To: Christopher White; +Cc: Roman Mamedov, linux-raid

Hi Christopher,

On 05/13/2011 02:54 PM, Christopher White wrote:
> Hello again Phil (and Roman). Thanks to your back-and-forth, the bug has now finally been completely narrowed down: It is a bug in (g)parted!

Good to know.  A pointer to the formal bug report would be a good followup, when you have it.

> The issue is that (g)parted doesn't properly call the kernel API for re-scanning the device when you operate on md disks as compared to physical disks.
> 
> Your information (Phil) that (g)parted chokes on an assertion is good information for when I report this bug. It's not impossible that you must handle md-disks differently from physical disks and that (g)parted is not aware of that distinction, therefore choking on the partition table rescan API.
> 
> Either way, this is fantastic news, because it means it's not an md kernel bug, where waiting for a fix would have severely pushed back my current project. I'm glad it was simply (g)parted failing to tell the kernel to re-read the partition tables.
> 
> ---
> 
> With this bug out of the way (I'll be reporting it to parted's mailing list now),one thing that's been bugging me during my hours of research is that the vast majority of users use either a single, large RAID array and virtually partition that with LVM, or alternatively breaking each disk into many small partitions and making multiple smaller arrays out of those partitions. Very few people seem to use md's built-in support for partitionable raid arrays.
> 
> This makes me a tiny bit wary to trust the stability of md's partitionable implementation, even though I suspect it is rock solid. I suspect the reason that most people don't use the feature is for legacy/habit reasons, since md used to support only a single partition, so there's avast amount of guides telling people to use LVM. Do any of you know anything about this and can advise on whether I should go for a single-partition MD array with LVM, or a partitionable MD array?
> 
> As far as performance goes, the CPU overhead of LVM is in the 1-5% range from what I've heard, and I have zero need for the other features LVM provides (snapshots, backups, online resizing, clusters of disks acting as one disk, etc), so it just feels completely overkill and worthless when all I need is a single, partitionable RAID array.

I always use LVM.  While the lack of attention to MD partitions might justify that, the real reason is the sheer convenience of creating, manipulating, and deleting logical volumes on the fly.  While you may not need it *now*, when you discover that you *do* need it, you won't be able to use it.  Online resizing of any of your LVs is the killer feature.

Also, I'd be shocked if the LVM overhead for plain volumes was close to 1%.  In fact, I'd be surprised if it was even 0.1%.  Do you have any benchmarks that show otherwise?

> All I need is the ability to (in the future) add more disks to the array, grow the array, and then resize+move the partitions around using regular partitioning tools treating the RAID array as a single disk, and md's partitionable arrays support doing this since they act as a disk, where if you add more hard disks to your array; the available, unallocated space on that array simply grows and partitions on it can be expanded and relocated to take advantage of this. I don't need LVM for any of that, as long as md's implementation is stable.

If you can take the downtime, this is true.

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 19:22               ` Phil Turmel
@ 2011-05-13 19:32                 ` Roman Mamedov
  2011-05-13 19:39                   ` Phil Turmel
  2011-05-14 10:10                   ` David Brown
  0 siblings, 2 replies; 21+ messages in thread
From: Roman Mamedov @ 2011-05-13 19:32 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Christopher White, linux-raid

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

On Fri, 13 May 2011 15:22:09 -0400
Phil Turmel <philip@turmel.org> wrote:

> I always use LVM.  While the lack of attention to MD partitions might
> justify that, the real reason is the sheer convenience of creating,
> manipulating, and deleting logical volumes on the fly.  While you may not
> need it *now*, when you discover that you *do* need it, you won't be able to
> use it.  Online resizing of any of your LVs is the killer feature.

Can it defragment non-contiguous LVs yet?

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 19:32                 ` Roman Mamedov
@ 2011-05-13 19:39                   ` Phil Turmel
  2011-05-14 10:10                   ` David Brown
  1 sibling, 0 replies; 21+ messages in thread
From: Phil Turmel @ 2011-05-13 19:39 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Christopher White, linux-raid

On 05/13/2011 03:32 PM, Roman Mamedov wrote:
> On Fri, 13 May 2011 15:22:09 -0400
> Phil Turmel <philip@turmel.org> wrote:
> 
>> I always use LVM.  While the lack of attention to MD partitions might
>> justify that, the real reason is the sheer convenience of creating,
>> manipulating, and deleting logical volumes on the fly.  While you may not
>> need it *now*, when you discover that you *do* need it, you won't be able to
>> use it.  Online resizing of any of your LVs is the killer feature.
> 
> Can it defragment non-contiguous LVs yet?

Automatically, no (so far as I've seen).  If you have a suitable free space chunk in the group, though, you can manually create a contiguous mirror, let it sync, then remove the original segments.  Do it twice if placement is critical.

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 19:01               ` Rudy Zijlstra
@ 2011-05-13 19:49                 ` Christopher White
  2011-05-13 20:00                   ` Rudy Zijlstra
  2011-05-13 19:49                 ` Christopher White
  1 sibling, 1 reply; 21+ messages in thread
From: Christopher White @ 2011-05-13 19:49 UTC (permalink / raw)
  To: Rudy Zijlstra, Roman Mamedov, Phil Turmel; +Cc: linux-raid

On 5/13/11 9:01 PM, Rudy Zijlstra wrote:
> Hi Chris,
>
>
> I've run paritioned MD disks for several years now. I do that on 
> systems where i use md for the system partitions. One mirror with 
> partitions for the different system aspects. I prefer that, as it 
> reflects best the actual physical configuration, and all partitions 
> will be degraded at the same time when 1 disk develops a problem 
> (which is unfortunately not the case when you partition the disk and 
> then mirror the partitions).
>
> As i am a bit lazy and have only limited wish to fight with 
> BIOS/bootloader conflicts / vagaries, these systems typically boot 
> from the network (kernel gets loaded from the network, from there 
> onwards all is on the local disk).
>
> Cheers,
>
>
>
> Rudy
Thank you for the information, Rudy,

Your experience of running partitioned MD arrays for years shows that it 
is indeed stable. The reason for wanting to skip LVM was that it's one 
less performance-penalty layer, one less layer to configure, one less 
possible point of failure, etc.

However, Phil again brings up the main fear that's been nagging me, and 
that is that MD's partitioning support receives less love (use) and 
therefore risks having bugs that go undiscovered for ages and (gasp) may 
even risk corrupting the data. People are just so used to LVM since MD 
used to be single-partition only, that LVM+single-partition MD array is 
far more mature and far more in use.

My main reason against LVM was the performance penalty, where I had read 
that it was in the 1-5% range, but I just did a new search and saw 
threads showing that any performance hit claim is outdated and that LVM2 
is extremely efficient. In fact the CPU load didn't seem to be impacted 
more than 0.1% or so in the graphs I saw.

By the way, Rudy, as for your boot conflicts and the fact that you 
resort to running a network boot, that was only a problem in the past 
when bootloaders did not support software RAID. Grub2 supports GPT, MD 
arrays with metadata 1.2, and can fully boot from a system (with /boot) 
installation located on your MD array. All you'll have to do is make 
sure your /boot partition (and the whole system if you want to) is on a 
RAID 1 (mirrored) array, and that you install the Grub2 bootloader on 
every physical disk. This means that it goes:

Computer starts up -> BIOS/EFI picks any of the hard drives to boot from 
-> GRUB2 loads -> GRUB2 sees the MD RAID1 array and picks ANY of the 
disks to boot from (since they are all mirrored) and treats it as a 
regular, raw disk as if you didn't use an array at all.

I think you may have to do some slight extra work to get the system disk 
to mount as RAID1 for the OS and RAID 5 for your other array(s) after 
the kernel has booted, I think you have to first boot into a ram 
filesystem to allow the disk to be unmounted and re-mounted as a RAID 1 
array, but it's not hard, there are guides for it. Just get a 2.6-series 
kernel, grub2, a RAID1 array for the OS, and a guide and you will be 
set. It will remove the need for you to keep a network PXE boot server.

On 5/13/11 9:22 PM, Phil Turmel wrote:
> Hi Christopher,
>
> On 05/13/2011 02:54 PM, Christopher White wrote:
>> Hello again Phil (and Roman). Thanks to your back-and-forth, the bug has now finally been completely narrowed down: It is a bug in (g)parted!
> Good to know.  A pointer to the formal bug report would be a good followup, when you have it.
I've submitted a detailed report to the bug-parted mailing list and want 
to sincerely thank ALL of you for your discussion to help narrow it 
down. Thank you very much! The bug-parted archive seems slow to refresh, 
but the posting is called "[Confirmed Bug] Parted does not notify kernel 
when modifying partition tables in partitionable md arrays" and was 
posted about 40 minutes ago. It contains a list of the steps to 
reproduce the bug and the theories of why it happens. It should show up 
here eventually: 
http://lists.gnu.org/archive/html/bug-parted/2011-05/threads.html
> I always use LVM.  While the lack of attention to MD partitions might justify that, the real reason is the sheer convenience of creating, manipulating, and deleting logical volumes on the fly.  While you may not need it *now*, when you discover that you *do* need it, you won't be able to use it.  Online resizing of any of your LVs is the killer feature.
>
> Also, I'd be shocked if the LVM overhead for plain volumes was close to 1%.  In fact, I'd be surprised if it was even 0.1%.  Do you have any benchmarks that show otherwise?
As I wrote above, you re-inforce the fear I have that the lack of 
attention to MD partitions is an added risk, compared to the 
ultra-well-maintained LVM2 layer. Now that the performance question is 
out of the way, I will actually go for it. Online resizing and so on 
isn't very interesting since I use the partitions for storage that isn't 
in need of 100% availability, but the fact that LVM can be trusted with 
my life whereas MD partitions are rarely used, and the fact that LVM(2) 
turned out to be extremely effective CPU-wise, just settles it.


Christopher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 19:01               ` Rudy Zijlstra
  2011-05-13 19:49                 ` Christopher White
@ 2011-05-13 19:49                 ` Christopher White
  1 sibling, 0 replies; 21+ messages in thread
From: Christopher White @ 2011-05-13 19:49 UTC (permalink / raw)
  To: Rudy Zijlstra, Roman Mamedov, Phil Turmel; +Cc: linux-raid

On 5/13/11 9:01 PM, Rudy Zijlstra wrote:
> Hi Chris,
>
>
> I've run paritioned MD disks for several years now. I do that on 
> systems where i use md for the system partitions. One mirror with 
> partitions for the different system aspects. I prefer that, as it 
> reflects best the actual physical configuration, and all partitions 
> will be degraded at the same time when 1 disk develops a problem 
> (which is unfortunately not the case when you partition the disk and 
> then mirror the partitions).
>
> As i am a bit lazy and have only limited wish to fight with 
> BIOS/bootloader conflicts / vagaries, these systems typically boot 
> from the network (kernel gets loaded from the network, from there 
> onwards all is on the local disk).
>
> Cheers,
>
>
>
> Rudy
Thank you for the information, Rudy,

Your experience of running partitioned MD arrays for years shows that it 
is indeed stable. The reason for wanting to skip LVM was that it's one 
less performance-penalty layer, one less layer to configure, one less 
possible point of failure, etc.

However, Phil again brings up the main fear that's been nagging me, and 
that is that MD's partitioning support receives less love (use) and 
therefore risks having bugs that go undiscovered for ages and (gasp) may 
even risk corrupting the data. People are just so used to LVM since MD 
used to be single-partition only, that LVM+single-partition MD array is 
far more mature and far more in use.

My main reason against LVM was the performance penalty, where I had read 
that it was in the 1-5% range, but I just did a new search and saw 
threads showing that any performance hit claim is outdated and that LVM2 
is extremely efficient. In fact the CPU load didn't seem to be impacted 
more than 0.1% or so in the graphs I saw.

By the way, Rudy, as for your boot conflicts and the fact that you 
resort to running a network boot, that was only a problem in the past 
when bootloaders did not support software RAID. Grub2 supports GPT, MD 
arrays with metadata 1.2, and can fully boot from a system (with /boot) 
installation located on your MD array. All you'll have to do is make 
sure your /boot partition (and the whole system if you want to) is on a 
RAID 1 (mirrored) array, and that you install the Grub2 bootloader on 
every physical disk. This means that it goes:

Computer starts up -> BIOS/EFI picks any of the hard drives to boot from 
-> GRUB2 loads -> GRUB2 sees the MD RAID1 array and picks ANY of the 
disks to boot from (since they are all mirrored) and treats it as a 
regular, raw disk as if you didn't use an array at all.

I think you may have to do some slight extra work to get the system disk 
to mount as RAID1 for the OS and RAID 5 for your other array(s) after 
the kernel has booted, I think you have to first boot into a ram 
filesystem to allow the disk to be unmounted and re-mounted as a RAID 1 
array, but it's not hard, there are guides for it. Just get a 2.6-series 
kernel, grub2, a RAID1 array for the OS, and a guide and you will be 
set. It will remove the need for you to keep a network PXE boot server.

On 5/13/11 9:22 PM, Phil Turmel wrote:
> Hi Christopher,
>
> On 05/13/2011 02:54 PM, Christopher White wrote:
>> Hello again Phil (and Roman). Thanks to your back-and-forth, the bug has now finally been completely narrowed down: It is a bug in (g)parted!
> Good to know.  A pointer to the formal bug report would be a good followup, when you have it.
I've submitted a detailed report to the bug-parted mailing list and want 
to sincerely thank ALL of you for your discussion to help narrow it 
down. Thank you very much! The bug-parted archive seems slow to refresh, 
but the posting is called "[Confirmed Bug] Parted does not notify kernel 
when modifying partition tables in partitionable md arrays" and was 
posted about 40 minutes ago. It contains a list of the steps to 
reproduce the bug and the theories of why it happens. It should show up 
here eventually: 
http://lists.gnu.org/archive/html/bug-parted/2011-05/threads.html
> I always use LVM.  While the lack of attention to MD partitions might justify that, the real reason is the sheer convenience of creating, manipulating, and deleting logical volumes on the fly.  While you may not need it *now*, when you discover that you *do* need it, you won't be able to use it.  Online resizing of any of your LVs is the killer feature.
>
> Also, I'd be shocked if the LVM overhead for plain volumes was close to 1%.  In fact, I'd be surprised if it was even 0.1%.  Do you have any benchmarks that show otherwise?
As I wrote above, you re-inforce the fear I have that the lack of 
attention to MD partitions is an added risk, compared to the 
ultra-well-maintained LVM2 layer. Now that the performance question is 
out of the way, I will actually go for it. Online resizing and so on 
isn't very interesting since I use the partitions for storage that isn't 
in need of 100% availability, but the fact that LVM can be trusted with 
my life whereas MD partitions are rarely used, and the fact that LVM(2) 
turned out to be extremely effective CPU-wise, just settles it.


Christopher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 19:49                 ` Christopher White
@ 2011-05-13 20:00                   ` Rudy Zijlstra
  0 siblings, 0 replies; 21+ messages in thread
From: Rudy Zijlstra @ 2011-05-13 20:00 UTC (permalink / raw)
  To: Christopher White; +Cc: Roman Mamedov, Phil Turmel, linux-raid

On 05/13/2011 09:49 PM, Christopher White wrote:
> On 5/13/11 9:01 PM, Rudy Zijlstra wrote:
>> Hi Chris,
>>
>>
>> I've run paritioned MD disks for several years now. I do that on 
>> systems where i use md for the system partitions. One mirror with 
>> partitions for the different system aspects. I prefer that, as it 
>> reflects best the actual physical configuration, and all partitions 
>> will be degraded at the same time when 1 disk develops a problem 
>> (which is unfortunately not the case when you partition the disk and 
>> then mirror the partitions).
>>
>> As i am a bit lazy and have only limited wish to fight with 
>> BIOS/bootloader conflicts / vagaries, these systems typically boot 
>> from the network (kernel gets loaded from the network, from there 
>> onwards all is on the local disk).
>>
>> Cheers,
>>
>>
>>
>> Rudy
> Thank you for the information, Rudy,
>
> Your experience of running partitioned MD arrays for years shows that 
> it is indeed stable. The reason for wanting to skip LVM was that it's 
> one less performance-penalty layer, one less layer to configure, one 
> less possible point of failure, etc.

I skip LVM cause for my usage pattern it only gives me an additional 
management layer... an additional layer to configure

>
> However, Phil again brings up the main fear that's been nagging me, 
> and that is that MD's partitioning support receives less love (use) 
> and therefore risks having bugs that go undiscovered for ages and 
> (gasp) may even risk corrupting the data. People are just so used to 
> LVM since MD used to be single-partition only, that 
> LVM+single-partition MD array is far more mature and far more in use.
MD layer and LVM layer are independently maintained. There regular use 
together would trigger eventual bugs quicker though.

>
> My main reason against LVM was the performance penalty, where I had 
> read that it was in the 1-5% range, but I just did a new search and 
> saw threads showing that any performance hit claim is outdated and 
> that LVM2 is extremely efficient. In fact the CPU load didn't seem to 
> be impacted more than 0.1% or so in the graphs I saw.
>
> By the way, Rudy, as for your boot conflicts and the fact that you 
> resort to running a network boot, that was only a problem in the past 
> when bootloaders did not support software RAID. Grub2 supports GPT, MD 
> arrays with metadata 1.2, and can fully boot from a system (with 
> /boot) installation located on your MD array. All you'll have to do is 
> make sure your /boot partition (and the whole system if you want to) 
> is on a RAID 1 (mirrored) array, and that you install the Grub2 
> bootloader on every physical disk. This means that it goes:
>

I know... but i happen to dislike grub2, and my network boot environment 
is stable and well maintained.
Grub2 is for me a step backwards. more difficult to configure, and i've 
gone back to lilo as main bootloader.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-13 19:32                 ` Roman Mamedov
  2011-05-13 19:39                   ` Phil Turmel
@ 2011-05-14 10:10                   ` David Brown
  2011-05-14 10:24                     ` Roman Mamedov
  1 sibling, 1 reply; 21+ messages in thread
From: David Brown @ 2011-05-14 10:10 UTC (permalink / raw)
  To: linux-raid

On 13/05/11 21:32, Roman Mamedov wrote:
> On Fri, 13 May 2011 15:22:09 -0400
> Phil Turmel<philip@turmel.org>  wrote:
>
>> I always use LVM.  While the lack of attention to MD partitions might
>> justify that, the real reason is the sheer convenience of creating,
>> manipulating, and deleting logical volumes on the fly.  While you may not
>> need it *now*, when you discover that you *do* need it, you won't be able to
>> use it.  Online resizing of any of your LVs is the killer feature.
>
> Can it defragment non-contiguous LVs yet?
>

What is perhaps more relevant, is can filesystems see the fragmentation 
of the LV's?  I don't know the answer.

Fragmentation of files is not a problem unless files are split into 
/lots/ of small pieces.  The bad reputation of fragmentation has come 
from the DOS/Windows world, where poor filesystems combined with 
shotgun-style allocators give you much slower performance than necessary.

Modern Linux filesystems have various techniques to keep fragmentation 
to a minimum.  But (AFAIK) they make the assumption that the underlying 
device is contiguous.  If the filesystem /knows/ that the device is in 
bits, then it could take that into account in its allocation policy (in 
the same way that it takes raid stripes into account).

Still, you don't usually have many segments in an LV - if you want the 
LV to be fast, you can request it to be contiguous when creating it. 
Then you only get a fragment for each time it is grown.  It's a price 
often worth paying for the flexibility.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-14 10:10                   ` David Brown
@ 2011-05-14 10:24                     ` Roman Mamedov
  2011-05-14 12:56                       ` David Brown
  0 siblings, 1 reply; 21+ messages in thread
From: Roman Mamedov @ 2011-05-14 10:24 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

On Sat, 14 May 2011 12:10:56 +0200
David Brown <david.brown@hesbynett.no> wrote:

> What is perhaps more relevant, is can filesystems see the fragmentation 
> of the LV's?  I don't know the answer.

No, of course they can't.

> Still, you don't usually have many segments in an LV - if you want the 
> LV to be fast, you can request it to be contiguous when creating it. 
> Then you only get a fragment for each time it is grown.  It's a price 
> often worth paying for the flexibility.

From what I see, the key selling point for LVM is the ability to 'easily'
add/remove/resize LVs. And then if you buy that and start to actively use
these features, you end up in a situation (badly fragmented LVs) from which
there isn't a proper way out. No - backup and restore, or 'have enough
contiguous free space to mirror your entire LV and then nuke the original' are
not the answer. What's sad is that there isn't any fundamental technical
reason LVs can't be defragmented. They can, just no one has bothered to write
the corresponding code yet.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-14 10:24                     ` Roman Mamedov
@ 2011-05-14 12:56                       ` David Brown
  2011-05-14 13:27                         ` Drew
  0 siblings, 1 reply; 21+ messages in thread
From: David Brown @ 2011-05-14 12:56 UTC (permalink / raw)
  To: linux-raid

On 14/05/11 12:24, Roman Mamedov wrote:
> On Sat, 14 May 2011 12:10:56 +0200
> David Brown<david.brown@hesbynett.no>  wrote:
>
>> What is perhaps more relevant, is can filesystems see the fragmentation
>> of the LV's?  I don't know the answer.
>
> No, of course they can't.
>
>> Still, you don't usually have many segments in an LV - if you want the
>> LV to be fast, you can request it to be contiguous when creating it.
>> Then you only get a fragment for each time it is grown.  It's a price
>> often worth paying for the flexibility.
>
>  From what I see, the key selling point for LVM is the ability to 'easily'
> add/remove/resize LVs. And then if you buy that and start to actively use
> these features, you end up in a situation (badly fragmented LVs) from which
> there isn't a proper way out. No - backup and restore, or 'have enough
> contiguous free space to mirror your entire LV and then nuke the original' are
> not the answer. What's sad is that there isn't any fundamental technical
> reason LVs can't be defragmented. They can, just no one has bothered to write
> the corresponding code yet.
>

I'm sure that LV's could be defragmented - there is already code to move 
them around on the disks (such as to move them out of a PV before 
deleting the PV).  I don't know why it hasn't been implemented - maybe 
there are too few people working on LVM, or that it is a low priority, 
or that LV fragmentation makes very little measurable difference in 
practice.

Personally, I find LVM to be a hugely useful tool.  I like being able to 
make new logical volumes when I need them, and resize them as 
convenient.  For servers, I make heavy use of openvz lightweight virtual 
serving, and I make a new LV for each "machine".  So setting up a new 
"server" with its own "disk" is done in a couple of minutes.  And if the 
needs of the "server" outgrow its bounds, it's easy to extend it.

I've had plenty of other cases where LVM has saved me a lot of time and 
effort.  It's not that long ago since I temporarily needed a bit more 
space on a server, and didn't have the time or spare disk to build it 
out.  So I added a USB disk I had lying around, made a PV on it, and 
extended the server's LVs onto the disk.  Obviously this sort of thing 
gives a performance hit - but it was better to be slow than not working. 
  For me, LVM's flexibility is worth the minor performance cost.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-14 12:56                       ` David Brown
@ 2011-05-14 13:27                         ` Drew
  2011-05-14 18:21                           ` David Brown
  0 siblings, 1 reply; 21+ messages in thread
From: Drew @ 2011-05-14 13:27 UTC (permalink / raw)
  To: linux-raid

> I'm sure that LV's could be defragmented - there is already code to move
> them around on the disks (such as to move them out of a PV before deleting
> the PV).  I don't know why it hasn't been implemented - maybe there are too
> few people working on LVM, or that it is a low priority, or that LV
> fragmentation makes very little measurable difference in practice.

I've always figured it was because fragmentation in the LV's caused
little performance degradation. If we were talking about LV's composed
of hundreds of fragments I would expect to see degradation but I've
never come across a scenario where LV's have been that bad.

Someone refered to DOS in an earlier post and I think that's a good
example of relevance. I maintain a bunch of Windows based machines at
work and I did some performance benchmarking between a traditional
defrag utility and some of the "professional" versions. Bells and
whistles aside, what set most of the Pro versions apart from the
standard defrag utilities was the concept of "good enough" defrag,
which basically puts files into several larger fragments as opposed to
a complete defrag. I ran tests on filesystem performance before and
after defraging drives with both options and the change in performance
between a full defrag and a "good enough" defrag was minimal.


-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mdadm does not create partition devices whatsoever, "partitionable" functionality broken
  2011-05-14 13:27                         ` Drew
@ 2011-05-14 18:21                           ` David Brown
  0 siblings, 0 replies; 21+ messages in thread
From: David Brown @ 2011-05-14 18:21 UTC (permalink / raw)
  To: linux-raid

On 14/05/11 15:27, Drew wrote:
>> I'm sure that LV's could be defragmented - there is already code to move
>> them around on the disks (such as to move them out of a PV before deleting
>> the PV).  I don't know why it hasn't been implemented - maybe there are too
>> few people working on LVM, or that it is a low priority, or that LV
>> fragmentation makes very little measurable difference in practice.
>
> I've always figured it was because fragmentation in the LV's caused
> little performance degradation. If we were talking about LV's composed
> of hundreds of fragments I would expect to see degradation but I've
> never come across a scenario where LV's have been that bad.
>

I too think the fragmentation is probably a small effect - but it is 
maybe a measurable effect nonetheless.  I've never seen any benchmarks 
on it.  However, I know that some filesystems (such as xfs in 
particular, and ext4 to a lesser extent) go out of their way to reduce 
the risk of fragmentation in files - if it is worth their effort, then 
perhaps it is also worth it for LVM.

> Someone refered to DOS in an earlier post and I think that's a good
> example of relevance. I maintain a bunch of Windows based machines at
> work and I did some performance benchmarking between a traditional
> defrag utility and some of the "professional" versions. Bells and
> whistles aside, what set most of the Pro versions apart from the
> standard defrag utilities was the concept of "good enough" defrag,
> which basically puts files into several larger fragments as opposed to
> a complete defrag. I ran tests on filesystem performance before and
> after defraging drives with both options and the change in performance
> between a full defrag and a "good enough" defrag was minimal.
>

You will probably also find that the real-world difference between no 
defrag and "good enough" defrag is also minimal.

There are some heavily used files and directories that get so badly 
fragmented in windows systems that they can benefit from a defrag - for 
example the registry files, the windows directory, and some of the NTFS 
structures.  Of course, these are the parts that normal defrag utilities 
can't help - they can't be defragged while the system is running.  But 
for most other parts of the system, defrag makes very little real 
difference, especially as it is so temporary.

In the old days, before DOS and Windows had any sort of file or disk 
cache, defraging had a bigger effect.  But now you are far better off 
spending money on some extra ram for more cache space than on 
"professional" defrag programs.




^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-05-14 18:21 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-13 15:13 mdadm does not create partition devices whatsoever, "partitionable" functionality broken Christopher White
2011-05-13 16:49 ` Phil Turmel
2011-05-13 17:18   ` Christopher White
2011-05-13 17:32     ` Christopher White
2011-05-13 17:40       ` Roman Mamedov
2011-05-13 18:04         ` Christopher White
2011-05-13 18:18           ` Phil Turmel
2011-05-13 18:54             ` Christopher White
2011-05-13 19:01               ` Rudy Zijlstra
2011-05-13 19:49                 ` Christopher White
2011-05-13 20:00                   ` Rudy Zijlstra
2011-05-13 19:49                 ` Christopher White
2011-05-13 19:22               ` Phil Turmel
2011-05-13 19:32                 ` Roman Mamedov
2011-05-13 19:39                   ` Phil Turmel
2011-05-14 10:10                   ` David Brown
2011-05-14 10:24                     ` Roman Mamedov
2011-05-14 12:56                       ` David Brown
2011-05-14 13:27                         ` Drew
2011-05-14 18:21                           ` David Brown
2011-05-13 17:43       ` Phil Turmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.