All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID6 and crashes
@ 2010-06-10 18:02 Miles Fidelman
  2010-06-10 18:57 ` Roman Mamedov
  0 siblings, 1 reply; 26+ messages in thread
From: Miles Fidelman @ 2010-06-10 18:02 UTC (permalink / raw)
  To: linux-raid

Hi Folks,

I just recently converted a server from a basic Debian Lenny 
installation to a virtualized platform (Debian Lenny, Xen 3, Debian 
Lenny DomUs).

I also converted my underlying disk environment from RAID1 to a mix of 
RAID1 (for Dom0) and RAID6/LVM/DRBD for the domUs.  All the RAID is 
implemented using md.  (Yes I realize there's a performance hit - but it 
seemed like a good idea at the time, and with volumes mounted with 
"noatime" the performance is acceptable, though I'm sort of thinking now 
of moving to RAID10).

Anyway, I'm still working out some instabilities in my virtualized 
environment, and I seem to have a crash/reboot event maybe once a day 
(still trying to track that down).

In some, but not all cases, I find the machine comes up with the RAID6 
volume marked dirty, and an automatic resync gets initiated - which 
takes several hours to complete, and drags performance way down while 
it's going on.

Which leads to two questions:

1. Are there any known problems with md-based RAID6 that might, 
themselves, lead to a crash/reboot? (I always suspect complicated, 
low-level functions that are critical to everything).

2.  Are there any settings that can reduce the likelihood of a RAID 
volume being dirty after a crash?  (The crash/reboot isn't that much of 
a problem - the several hours of degraded performance ARE a problem.)

Thanks Very Much,

Miles Fidelman

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes
  2010-06-10 18:02 RAID6 and crashes Miles Fidelman
@ 2010-06-10 18:57 ` Roman Mamedov
  2010-06-10 21:22   ` RAID6 and crashes (reporting back re. --bitmap) Miles Fidelman
  0 siblings, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2010-06-10 18:57 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On Thu, 10 Jun 2010 14:02:42 -0400
Miles Fidelman <mfidelman@meetinghouse.net> wrote:
 
> 2.  Are there any settings that can reduce the likelihood of a RAID 
> volume being dirty after a crash?  (The crash/reboot isn't that much of 
> a problem - the several hours of degraded performance ARE a problem.)

Do you currently have a write intent bitmap in the array? I think it can
reduce the need for recovery by an order of magnitude in some cases. Check man
mdadm for --bitmap if you don't use it yet.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-10 18:57 ` Roman Mamedov
@ 2010-06-10 21:22   ` Miles Fidelman
  2010-06-10 21:41     ` Roman Mamedov
  0 siblings, 1 reply; 26+ messages in thread
From: Miles Fidelman @ 2010-06-10 21:22 UTC (permalink / raw)
  Cc: linux-raid

Roman Mamedov wrote:
> On Thu, 10 Jun 2010 14:02:42 -0400
> Miles Fidelman<mfidelman@meetinghouse.net>  wrote:
>
>    
>> 2.  Are there any settings that can reduce the likelihood of a RAID
>> volume being dirty after a crash?  (The crash/reboot isn't that much of
>> a problem - the several hours of degraded performance ARE a problem.)
>>      
> Do you currently have a write intent bitmap in the array? I think it can
> reduce the need for recovery by an order of magnitude in some cases. Check man
> mdadm for --bitmap if you don't use it yet.
>    
Just went through the process of turning it on for all my arrays.  
Incredibly painless and quick.  Now I get to wait and see if it helps 
the next time I have a crash/reboot event.

Thanks very much!

Miles


-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-10 21:22   ` RAID6 and crashes (reporting back re. --bitmap) Miles Fidelman
@ 2010-06-10 21:41     ` Roman Mamedov
  2010-06-10 22:40       ` Miles Fidelman
  0 siblings, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2010-06-10 21:41 UTC (permalink / raw)
  To: Miles Fidelman; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]

On Thu, 10 Jun 2010 17:22:19 -0400
Miles Fidelman <mfidelman@meetinghouse.net> wrote:

> > Do you currently have a write intent bitmap in the array? I think it can
> > reduce the need for recovery by an order of magnitude in some cases. Check
> > man mdadm for --bitmap if you don't use it yet.
> >    
> Just went through the process of turning it on for all my arrays.  
> Incredibly painless and quick.  Now I get to wait and see if it helps 
> the next time I have a crash/reboot event.

I assume you went with "internal" bitmap, in which case if you notice that
write speed on the arrays became significantly lower, the first thing you
should look at is increasing the --bitmap-chunk size (I use 131072).

It is possible to use an external bitmap on an independent device (which has
almost zero performance impact), but in this case it could be non-trivial to
100% ensure that such a device is mounted and accessible at the moment during
boot-up when md arrays are being started, especially if one of those arrays
also hosts the root FS.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-10 21:41     ` Roman Mamedov
@ 2010-06-10 22:40       ` Miles Fidelman
  2010-06-11  2:51         ` Roman Mamedov
  0 siblings, 1 reply; 26+ messages in thread
From: Miles Fidelman @ 2010-06-10 22:40 UTC (permalink / raw)
  Cc: linux-raid

Roman Mamedov wrote:
> On Thu, 10 Jun 2010 17:22:19 -0400
> Miles Fidelman<mfidelman@meetinghouse.net>  wrote:
>
>    
>>> Do you currently have a write intent bitmap in the array? I think it can
>>> reduce the need for recovery by an order of magnitude in some cases. Check
>>> man mdadm for --bitmap if you don't use it yet.
>>>
>>>        
>> Just went through the process of turning it on for all my arrays.
>> Incredibly painless and quick.  Now I get to wait and see if it helps
>> the next time I have a crash/reboot event.
>>      
> I assume you went with "internal" bitmap, in which case if you notice that
> write speed on the arrays became significantly lower, the first thing you
> should look at is increasing the --bitmap-chunk size (I use 131072).
>    
Now you tell me :-)

Yes... went with internal.

I'll keep an eye on write performance.  Do you happen to know, off hand, 
a magic incantation to change the bitmap-chunk size? (Do I need to 
remove the bitmap I just set up and reinstall one with the larger chunk 
size?)
> It is possible to use an external bitmap on an independent device (which has
> almost zero performance impact), but in this case it could be non-trivial to
> 100% ensure that such a device is mounted and accessible at the moment during
> boot-up when md arrays are being started, especially if one of those arrays
> also hosts the root FS.
>    
I think I'll stick with internal.

Thanks again,

Miles


-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-10 22:40       ` Miles Fidelman
@ 2010-06-11  2:51         ` Roman Mamedov
  2010-06-11  4:31           ` Graham Mitchell
  2010-06-11  4:46           ` Miles Fidelman
  0 siblings, 2 replies; 26+ messages in thread
From: Roman Mamedov @ 2010-06-11  2:51 UTC (permalink / raw)
  To: Miles Fidelman; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 453 bytes --]

On Thu, 10 Jun 2010 18:40:11 -0400
Miles Fidelman <mfidelman@meetinghouse.net> wrote:

> Yes... went with internal.
> 
> I'll keep an eye on write performance.  Do you happen to know, off hand, 
> a magic incantation to change the bitmap-chunk size? (Do I need to 
> remove the bitmap I just set up and reinstall one with the larger chunk 
> size?)

Remove (--bitmap=none) then add again with new --bitmap-chunk.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  2:51         ` Roman Mamedov
@ 2010-06-11  4:31           ` Graham Mitchell
  2010-06-11  4:41             ` Roman Mamedov
                               ` (2 more replies)
  2010-06-11  4:46           ` Miles Fidelman
  1 sibling, 3 replies; 26+ messages in thread
From: Graham Mitchell @ 2010-06-11  4:31 UTC (permalink / raw)
  To: linux-raid

Can you do this on a live array, or can it only be done (as the docs seem to
suggest), with the create, build and grow options?


G

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Roman Mamedov
> Sent: Thursday, June 10, 2010 10:51 PM
> To: Miles Fidelman
> Cc: linux-raid@vger.kernel.org
> Subject: Re: RAID6 and crashes (reporting back re. --bitmap)
> 
> On Thu, 10 Jun 2010 18:40:11 -0400
> Miles Fidelman <mfidelman@meetinghouse.net> wrote:
> 
> > Yes... went with internal.
> >
> > I'll keep an eye on write performance.  Do you happen to know, off
> > hand, a magic incantation to change the bitmap-chunk size? (Do I need
> > to remove the bitmap I just set up and reinstall one with the larger
> > chunk
> > size?)
> 
> Remove (--bitmap=none) then add again with new --bitmap-chunk.
> 
> --
> With respect,
> Roman


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  4:31           ` Graham Mitchell
@ 2010-06-11  4:41             ` Roman Mamedov
  2010-06-11 12:13               ` Graham Mitchell
  2010-06-11  4:42             ` Miles Fidelman
  2010-06-11  4:50             ` Neil Brown
  2 siblings, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2010-06-11  4:41 UTC (permalink / raw)
  To: Graham Mitchell; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 644 bytes --]

On Fri, 11 Jun 2010 00:31:57 -0400
"Graham Mitchell" <gmitch@woodlea.com> wrote:

> Can you do this on a live array, or can it only be done (as the docs seem to
> suggest), with the create, build and grow options?

It is a variant of the grow operation, but it can be done on a live array,
even mounted, and completes instantly:

mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=131072

In my experience, removing the bitmap (setting it to none) may occasionally
fail (probably when the array has a lot of outstanding write requests), but
just try again when it's a bit quieter, and it'll work.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  4:31           ` Graham Mitchell
  2010-06-11  4:41             ` Roman Mamedov
@ 2010-06-11  4:42             ` Miles Fidelman
  2010-06-11  4:50             ` Neil Brown
  2 siblings, 0 replies; 26+ messages in thread
From: Miles Fidelman @ 2010-06-11  4:42 UTC (permalink / raw)
  Cc: linux-raid

Graham Mitchell wrote:
> Can you do this on a live array, or can it only be done (as the docs seem to
> suggest), with the create, build and grow options?
>
>
>    
I just did it on a live array.  Some --grow options, including --bitmap, 
seem to work on live arrays.

Miles

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  2:51         ` Roman Mamedov
  2010-06-11  4:31           ` Graham Mitchell
@ 2010-06-11  4:46           ` Miles Fidelman
  2010-06-11  4:55             ` Roman Mamedov
  2010-06-11  5:08             ` Neil Brown
  1 sibling, 2 replies; 26+ messages in thread
From: Miles Fidelman @ 2010-06-11  4:46 UTC (permalink / raw)
  Cc: linux-raid

Roman Mamedov wrote:
> On Thu, 10 Jun 2010 18:40:11 -0400
> Miles Fidelman<mfidelman@meetinghouse.net>  wrote:
>
>    
>> Yes... went with internal.
>>
>> I'll keep an eye on write performance.  Do you happen to know, off hand,
>> a magic incantation to change the bitmap-chunk size? (Do I need to
>> remove the bitmap I just set up and reinstall one with the larger chunk
>> size?)
>>      
> Remove (--bitmap=none) then add again with new --bitmap-chunk.
>
>    
Looks like my original --bitmap internal creation set a very large chunk 
size initially

md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
       947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
       bitmap: 6/226 pages [24KB], 1024KB chunk

unless that --bitmap-chunk=131072 recommendation is translates to 
131072KB (if so, are you really running 131MB chunks?)

Miles

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  4:31           ` Graham Mitchell
  2010-06-11  4:41             ` Roman Mamedov
  2010-06-11  4:42             ` Miles Fidelman
@ 2010-06-11  4:50             ` Neil Brown
  2010-06-13 14:28               ` Bernd Schubert
  2 siblings, 1 reply; 26+ messages in thread
From: Neil Brown @ 2010-06-11  4:50 UTC (permalink / raw)
  To: Graham Mitchell; +Cc: linux-raid

On Fri, 11 Jun 2010 00:31:57 -0400
"Graham Mitchell" <gmitch@woodlea.com> wrote:

> Can you do this on a live array, or can it only be done (as the docs seem to
> suggest), with the create, build and grow options?
> 

As 'grow' can (and must) be used on a live array you're question doesn't
exactly make sense.
Yes: it can be done on a live array.

NeilBrown

> 
> G
> 
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of Roman Mamedov
> > Sent: Thursday, June 10, 2010 10:51 PM
> > To: Miles Fidelman
> > Cc: linux-raid@vger.kernel.org
> > Subject: Re: RAID6 and crashes (reporting back re. --bitmap)
> > 
> > On Thu, 10 Jun 2010 18:40:11 -0400
> > Miles Fidelman <mfidelman@meetinghouse.net> wrote:
> > 
> > > Yes... went with internal.
> > >
> > > I'll keep an eye on write performance.  Do you happen to know, off
> > > hand, a magic incantation to change the bitmap-chunk size? (Do I need
> > > to remove the bitmap I just set up and reinstall one with the larger
> > > chunk
> > > size?)
> > 
> > Remove (--bitmap=none) then add again with new --bitmap-chunk.
> > 
> > --
> > With respect,
> > Roman
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  4:46           ` Miles Fidelman
@ 2010-06-11  4:55             ` Roman Mamedov
  2010-06-11 20:26               ` Miles Fidelman
  2010-06-11  5:08             ` Neil Brown
  1 sibling, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2010-06-11  4:55 UTC (permalink / raw)
  To: Miles Fidelman; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

On Fri, 11 Jun 2010 00:46:47 -0400
Miles Fidelman <mfidelman@meetinghouse.net> wrote:

> Looks like my original --bitmap internal creation set a very large chunk 
> size initially
> 
> md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
>        947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
>        bitmap: 6/226 pages [24KB], 1024KB chunk
> 
> unless that --bitmap-chunk=131072 recommendation is translates to 
> 131072KB (if so, are you really running 131MB chunks?)

Yes, this is correct.
This will only mean that after an unclean shutdown, at least 128MB-sized
areas of the array will be invalidated for a resync, and not smaller areas
with 1MB-granularity like on yours currently. 128 megabytes is just about 1
second of read throughput on modern drives, so I am okay with that. Several
128MB-windows here and there are still faster to resync than the whole array.
And this had an extemely good effect on write performance for me (increased it
by more than 1.5x) compared to a small chunk. Test for yourself, first without
the bitmap, then with various chunk sizes of it (ensure there's no other load
on the array, and note the speeds):

dd if=/dev/zero of=/your-raid/zerofile bs=1M count=2048 conv=notrunc,fdatasync

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  4:46           ` Miles Fidelman
  2010-06-11  4:55             ` Roman Mamedov
@ 2010-06-11  5:08             ` Neil Brown
  2010-06-11 11:10               ` John Hendrikx
  1 sibling, 1 reply; 26+ messages in thread
From: Neil Brown @ 2010-06-11  5:08 UTC (permalink / raw)
  To: Miles Fidelman; +Cc: linux-raid

On Fri, 11 Jun 2010 00:46:47 -0400
Miles Fidelman <mfidelman@meetinghouse.net> wrote:

> Roman Mamedov wrote:
> > On Thu, 10 Jun 2010 18:40:11 -0400
> > Miles Fidelman<mfidelman@meetinghouse.net>  wrote:
> >
> >    
> >> Yes... went with internal.
> >>
> >> I'll keep an eye on write performance.  Do you happen to know, off hand,
> >> a magic incantation to change the bitmap-chunk size? (Do I need to
> >> remove the bitmap I just set up and reinstall one with the larger chunk
> >> size?)
> >>      
> > Remove (--bitmap=none) then add again with new --bitmap-chunk.
> >
> >    
> Looks like my original --bitmap internal creation set a very large chunk 
> size initially
> 
> md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
>        947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
>        bitmap: 6/226 pages [24KB], 1024KB chunk
> 
> unless that --bitmap-chunk=131072 recommendation is translates to 
> 131072KB (if so, are you really running 131MB chunks?)

Yes, and 131MB (128MiB) is probably a little on the large side, but not
excessively so and may well be a very good number.

My current rule-of-thumb is that the bitmap chunk size should be about the
amount of data that can be written sequentially in 1 second. 131MB is maybe 2
seconds with today's technology, so it is close enough.

The idea is that normally if your filesystems provides fairly good locality,
you should not have very many bits in the bitmap set.  Probably 10s, possible
100s.

If this is the case, and each takes 1 second to resync, then resync time is
limited to a few minutes.

Smaller chunks might reduce this to less than a minute, but that probably
isn't worth it.  Conversely smaller chunks will tend to mean more updates to
the bitmap, so slower writes all the time.

On a 1TB drive there are 7500 131MB chunks.  So assuming a relatively small
number of bits set at a time, this will reduce resync time by a factor of
somewhere between 200 and 1000.  Hours become fewer minutes.  This is
probably enough for most situations.

I would be really interested to find out if my assumption of small numbers of
bits set is valid.   You can find out the number of bits set at any instant
with  "mdadm -X" run on some component of the array.

If anyone is able to report some samples of that number along with array
size / level / layout / number of devices etc and some guide to the workload,
it might be helpful in validating my rule-of-thumb.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  5:08             ` Neil Brown
@ 2010-06-11 11:10               ` John Hendrikx
  2010-06-11 11:50                 ` Roman Mamedov
  2010-06-11 12:25                 ` Graham Mitchell
  0 siblings, 2 replies; 26+ messages in thread
From: John Hendrikx @ 2010-06-11 11:10 UTC (permalink / raw)
  To: Neil Brown; +Cc: Miles Fidelman, linux-raid

Neil Brown wrote:
> On Fri, 11 Jun 2010 00:46:47 -0400
> Miles Fidelman <mfidelman@meetinghouse.net> wrote:
>
>   
>> Roman Mamedov wrote:
>>     
>>> On Thu, 10 Jun 2010 18:40:11 -0400
>>> Miles Fidelman<mfidelman@meetinghouse.net>  wrote:
>>>
>>>    
>>>       
>>>> Yes... went with internal.
>>>>
>>>> I'll keep an eye on write performance.  Do you happen to know, off hand,
>>>> a magic incantation to change the bitmap-chunk size? (Do I need to
>>>> remove the bitmap I just set up and reinstall one with the larger chunk
>>>> size?)
>>>>      
>>>>         
>>> Remove (--bitmap=none) then add again with new --bitmap-chunk.
>>>
>>>    
>>>       
>> Looks like my original --bitmap internal creation set a very large chunk 
>> size initially
>>
>> md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
>>        947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
>>        bitmap: 6/226 pages [24KB], 1024KB chunk
>>
>> unless that --bitmap-chunk=131072 recommendation is translates to 
>> 131072KB (if so, are you really running 131MB chunks?)
>>     
>
> Yes, and 131MB (128MiB) is probably a little on the large side, but not
> excessively so and may well be a very good number.
>   
I'm using --bitmap-chunk=131072 as well, with the same reasoning as you 
outlined in your post.  The bitmap will be small and require few updates 
while still providing a huge reduction in resync times.

> On a 1TB drive there are 7500 131MB chunks.  So assuming a relatively small
> number of bits set at a time, this will reduce resync time by a factor of
> somewhere between 200 and 1000.  Hours become fewer minutes.  This is
> probably enough for most situations.
>
> I would be really interested to find out if my assumption of small numbers of
> bits set is valid.   You can find out the number of bits set at any instant
> with  "mdadm -X" run on some component of the array.
>   
I was interested as well, so I ran this command:

 > mdadm -X /dev/md2

and this is the result(??):

        Filename : /dev/md2
           Magic : d747992c
mdadm: invalid bitmap magic 0xd747992c, the bitmap file appears to be 
corrupted
         Version : 1132474982
mdadm: unknown bitmap version 1132474982, either the bitmap file is 
corrupted or you need to upgrade your tools

 > cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4] [raid0]
md3 : active raid6 sdd1[7] sda1[0] sdj1[6] sdc1[3] sdg1[2] sdb1[1]
      3867871232 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] 
[UUUUUU]
      bitmap: 0/4 pages [0KB], 131072KB chunk

md2 : active raid6 sde1[7] sdi1[6] sdh1[3] sdg3[2] sdb2[1] sda2[0]
      3867871232 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] 
[UUUUUU]
      bitmap: 0/4 pages [0KB], 131072KB chunk

md0 : active raid1 sdg2[0] hda1[1]
      9767424 blocks [2/2] [UU]
     
unused devices: <none>

I upgraded to the latest available mdadm (in debian unstable) and it has 
the same results (for both arrays).

 > mdadm --version
mdadm - v3.1.2 - 10th March 2010

 > uname -a
Linux Ukyo 2.6.27.5 #1 SMP PREEMPT Sun Nov 9 08:32:40 CET 2008 i686 
GNU/Linux

Is this normal? :)  Both arrays were freshly created a few days ago, 
with mdadm v3.0.3...
> If anyone is able to report some samples of that number along with array
> size / level / layout / number of devices etc and some guide to the workload,
> it might be helpful in validating my rule-of-thumb.
>
> Thanks,
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

--John


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11 11:10               ` John Hendrikx
@ 2010-06-11 11:50                 ` Roman Mamedov
  2010-06-11 12:29                   ` Graham Mitchell
  2010-06-11 12:25                 ` Graham Mitchell
  1 sibling, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2010-06-11 11:50 UTC (permalink / raw)
  To: John Hendrikx; +Cc: Neil Brown, Miles Fidelman, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]

On Fri, 11 Jun 2010 13:10:23 +0200
John Hendrikx <hjohn@xs4all.nl> wrote:

> > I would be really interested to find out if my assumption of small numbers
> > of bits set is valid.   You can find out the number of bits set at any
> > instant with  "mdadm -X" run on some component of the array.
> >   
> I was interested as well, so I ran this command:
> 
>  > mdadm -X /dev/md2
> 
> and this is the result(??):
> 
>         Filename : /dev/md2
>            Magic : d747992c
> mdadm: invalid bitmap magic 0xd747992c, the bitmap file appears to be 
> corrupted
>          Version : 1132474982
> mdadm: unknown bitmap version 1132474982, either the bitmap file is 
> corrupted or you need to upgrade your tools

I stumbled in the same way initially, but then re-read more closely and
noticed that Neil said to run it "on some component of the array",
e.g. /dev/sdxN, not the array itself -- and that way it worked fine. However as
my array sees almost no write load at the moment, I have no useful results to
report.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  4:41             ` Roman Mamedov
@ 2010-06-11 12:13               ` Graham Mitchell
  0 siblings, 0 replies; 26+ messages in thread
From: Graham Mitchell @ 2010-06-11 12:13 UTC (permalink / raw)
  To: linux-raid

Thanks to everyone for replying, it is indeed a simple operation and
completes immediately. I was just worried since I equate grow more to an
array reshape than to a management/tune option. I guess it's just my
paranoia showing thru... :)


G

> -----Original Message-----
> From: Roman Mamedov [mailto:roman@rm.pp.ru]
> Sent: Friday, June 11, 2010 12:42 AM
> To: Graham Mitchell
> Cc: linux-raid@vger.kernel.org
> Subject: Re: RAID6 and crashes (reporting back re. --bitmap)
> 
> On Fri, 11 Jun 2010 00:31:57 -0400
> "Graham Mitchell" <gmitch@woodlea.com> wrote:
> 
> > Can you do this on a live array, or can it only be done (as the docs
> > seem to suggest), with the create, build and grow options?
> 
> It is a variant of the grow operation, but it can be done on a live array,
even
> mounted, and completes instantly:
> 
> mdadm --grow /dev/md0 --bitmap=internal --bitmap-chunk=131072
> 
> In my experience, removing the bitmap (setting it to none) may
occasionally
> fail (probably when the array has a lot of outstanding write requests),
but
> just try again when it's a bit quieter, and it'll work.
> 
> --
> With respect,
> Roman


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11 11:10               ` John Hendrikx
  2010-06-11 11:50                 ` Roman Mamedov
@ 2010-06-11 12:25                 ` Graham Mitchell
  1 sibling, 0 replies; 26+ messages in thread
From: Graham Mitchell @ 2010-06-11 12:25 UTC (permalink / raw)
  To: linux-raid
  Cc: 'Miles Fidelman', 'Neil Brown', 'John Hendrikx'

> I was interested as well, so I ran this command:
> 


I also was interested, and did mdadm -X /dev/md0, and this is my output...

mdadm -X /dev/md0
        Filename : /dev/md0
           Magic : 00000000
mdadm: invalid bitmap magic 0x0, the bitmap file appears to be corrupted
         Version : 0
mdadm: unknown bitmap version 0, either the bitmap file is corrupted or you
need to upgrade your tools



mdadm --version
mdadm - v3.0.3 - 22nd October 2009


uname -a
Linux file00bert.woodlea.org.uk 2.6.32.12-115.fc12.i686.PAE #1 SMP Fri Apr
30 20:14:08 UTC 2010 i686 i686 i386 GNU/Linux

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 sdh1[0] sda1[14] sdp1[13] sdg1[12] sdo1[11] sdf1[10]
sdk1[9] sdn1[8] sde1[7] sdj1[6] sdm1[5] sdd1[4] sdi1[3] sdl1[2] sdc1[1]
      6348985344 blocks super 1.2 level 6, 512k chunk, algorithm 2 [15/15]
[UUUUUUUUUUUUUUU]
      bitmap: 0/2 pages [0KB], 131072KB chunk

unused devices: <none>





^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11 11:50                 ` Roman Mamedov
@ 2010-06-11 12:29                   ` Graham Mitchell
  0 siblings, 0 replies; 26+ messages in thread
From: Graham Mitchell @ 2010-06-11 12:29 UTC (permalink / raw)
  To: linux-raid
  Cc: 'Neil Brown', 'Miles Fidelman',
	'Roman Mamedov', 'John Hendrikx'

> I stumbled in the same way initially, but then re-read more closely and
> noticed that Neil said to run it "on some component of the array", e.g.
> /dev/sdxN, not the array itself -- and that way it worked fine. However as
my
> array sees almost no write load at the moment, I have no useful results to
> report.
> 
> --
> With respect,
> Roman
[GM> ] 

Duh.... You are correct, here's the output from one of my disks

mdadm -X /dev/sdn1
        Filename : /dev/sdn1
           Magic : 6d746962
         Version : 4
            UUID : 1470c671:4236b155:67287625:899db153
          Events : 10584
  Events Cleared : 10584
           State : OK
       Chunksize : 128 MB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 488383488 (465.76 GiB 500.10 GB)
          Bitmap : 3727 bits (chunks), 0 dirty (0.0%)


Like you, I've no load on it at the moment, but I do have a couple of GB to
copy onto it today, so I'll see if I can get some more figures.

I think I need some tea...:)


G


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  4:55             ` Roman Mamedov
@ 2010-06-11 20:26               ` Miles Fidelman
  0 siblings, 0 replies; 26+ messages in thread
From: Miles Fidelman @ 2010-06-11 20:26 UTC (permalink / raw)
  Cc: linux-raid

FYI:  I just:
- removed the bitmap
- installed a new bitmap with larger chunk-size
on 4 arrays, on each of two machines (redundant high-availability 
cluster setup)

Took me all of about 5 minutes, of which most of the time was waiting 
for virtual machines to migrate from one machine to the other, and then 
back.

All seems to be working, performance seems just a little snappier - but 
who can really tell until the next time an array rebuilds.

Thanks all (and Roman in particular) for your guidance.

Miles

Roman Mamedov wrote:
> On Fri, 11 Jun 2010 00:46:47 -0400
> Miles Fidelman<mfidelman@meetinghouse.net>  wrote:
>
>    
>> Looks like my original --bitmap internal creation set a very large chunk
>> size initially
>>
>> md3 : active raid6 sda4[0] sdd4[3] sdc4[2] sdb4[1]
>>         947417088 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
>>         bitmap: 6/226 pages [24KB], 1024KB chunk
>>
>> unless that --bitmap-chunk=131072 recommendation is translates to
>> 131072KB (if so, are you really running 131MB chunks?)
>>      
> Yes, this is correct.
> This will only mean that after an unclean shutdown, at least 128MB-sized
> areas of the array will be invalidated for a resync, and not smaller areas
> with 1MB-granularity like on yours currently. 128 megabytes is just about 1
> second of read throughput on modern drives, so I am okay with that. Several
> 128MB-windows here and there are still faster to resync than the whole array.
> And this had an extemely good effect on write performance for me (increased it
> by more than 1.5x) compared to a small chunk. Test for yourself, first without
> the bitmap, then with various chunk sizes of it (ensure there's no other load
> on the array, and note the speeds):
>
> dd if=/dev/zero of=/your-raid/zerofile bs=1M count=2048 conv=notrunc,fdatasync
>
>    


-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-11  4:50             ` Neil Brown
@ 2010-06-13 14:28               ` Bernd Schubert
  2010-06-13 23:05                 ` Neil Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Bernd Schubert @ 2010-06-13 14:28 UTC (permalink / raw)
  To: Neil Brown, linux-raid

On Friday 11 June 2010, Neil Brown wrote:
> On Fri, 11 Jun 2010 00:31:57 -0400
> 
> "Graham Mitchell" <gmitch@woodlea.com> wrote:
> > Can you do this on a live array, or can it only be done (as the docs seem
> > to suggest), with the create, build and grow options?
> 
> As 'grow' can (and must) be used on a live array you're question doesn't
> exactly make sense.
> Yes: it can be done on a live array.

While I have done this myself a couple of times, I still do not understand 
where it takes the disk space for the bitmap journal from? Is this space mdadm 
reserved for this purpose?


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-13 14:28               ` Bernd Schubert
@ 2010-06-13 23:05                 ` Neil Brown
  2010-06-14  9:01                   ` Bernd Schubert
  2010-06-14  9:14                   ` Roman Mamedov
  0 siblings, 2 replies; 26+ messages in thread
From: Neil Brown @ 2010-06-13 23:05 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-raid

On Sun, 13 Jun 2010 16:28:34 +0200
Bernd Schubert <bernd.schubert@fastmail.fm> wrote:

> On Friday 11 June 2010, Neil Brown wrote:
> > On Fri, 11 Jun 2010 00:31:57 -0400
> > 
> > "Graham Mitchell" <gmitch@woodlea.com> wrote:
> > > Can you do this on a live array, or can it only be done (as the docs seem
> > > to suggest), with the create, build and grow options?
> > 
> > As 'grow' can (and must) be used on a live array you're question doesn't
> > exactly make sense.
> > Yes: it can be done on a live array.
> 
> While I have done this myself a couple of times, I still do not understand 
> where it takes the disk space for the bitmap journal from? Is this space mdadm 
> reserved for this purpose?

Sort-of.
It uses space that the alignment requirements of the metadata assure us is
otherwise unused.
For v0.90, that is limited to 60K.  For 1.x it is 3K.
With recent kernels it is possible for mdadm to tell the kernel where to put
the bitmap (rather than the kernel *knowing*) so mdadm could use other space
that was reserved when the array was created, but I haven't implemented that
in mdadm yet.

NeilBrown

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-13 23:05                 ` Neil Brown
@ 2010-06-14  9:01                   ` Bernd Schubert
  2010-06-14  9:14                   ` Roman Mamedov
  1 sibling, 0 replies; 26+ messages in thread
From: Bernd Schubert @ 2010-06-14  9:01 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Monday 14 June 2010, Neil Brown wrote:
> On Sun, 13 Jun 2010 16:28:34 +0200
> 
> Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
> > On Friday 11 June 2010, Neil Brown wrote:
> > > On Fri, 11 Jun 2010 00:31:57 -0400
> > >
> > > "Graham Mitchell" <gmitch@woodlea.com> wrote:
> > > > Can you do this on a live array, or can it only be done (as the docs
> > > > seem to suggest), with the create, build and grow options?
> > >
> > > As 'grow' can (and must) be used on a live array you're question
> > > doesn't exactly make sense.
> > > Yes: it can be done on a live array.
> >
> > While I have done this myself a couple of times, I still do not
> > understand where it takes the disk space for the bitmap journal from? Is
> > this space mdadm reserved for this purpose?
> 
> Sort-of.
> It uses space that the alignment requirements of the metadata assure us is
> otherwise unused.
> For v0.90, that is limited to 60K.  For 1.x it is 3K.
> With recent kernels it is possible for mdadm to tell the kernel where to
>  put the bitmap (rather than the kernel *knowing*) so mdadm could use other
>  space that was reserved when the array was created, but I haven't
>  implemented that in mdadm yet.

Thanks a lot Neil! I added these information to the raid wiki

https://raid.wiki.kernel.org/index.php/Bitmap#Used_disk_space_for_bitmaps


Cheers,
Bernd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-13 23:05                 ` Neil Brown
  2010-06-14  9:01                   ` Bernd Schubert
@ 2010-06-14  9:14                   ` Roman Mamedov
  2010-06-14  9:47                     ` Neil Brown
  1 sibling, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2010-06-14  9:14 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bernd Schubert, linux-raid

[-- Attachment #1: Type: text/plain, Size: 794 bytes --]

On Mon, 14 Jun 2010 09:05:20 +1000
Neil Brown <neilb@suse.de> wrote:

> > While I have done this myself a couple of times, I still do not understand 
> > where it takes the disk space for the bitmap journal from? Is this space
> > mdadm reserved for this purpose?
> 
> Sort-of.
> It uses space that the alignment requirements of the metadata assure us is
> otherwise unused.
> For v0.90, that is limited to 60K.  For 1.x it is 3K.

I have now:

md0 : active raid5 sdf3[3] sde3[1] sda3[0]
      3887004672 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 1/8 pages [4KB], 131072KB chunk

Metadata is 1.2, and the internal bitmap is 8 pages, which is 32K, not 3K.
Did I misunderstand something, or perhaps 3K was a typo?

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-14  9:14                   ` Roman Mamedov
@ 2010-06-14  9:47                     ` Neil Brown
  2010-06-14 11:53                       ` Roman Mamedov
  0 siblings, 1 reply; 26+ messages in thread
From: Neil Brown @ 2010-06-14  9:47 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Bernd Schubert, linux-raid

On Mon, 14 Jun 2010 15:14:25 +0600
Roman Mamedov <roman@rm.pp.ru> wrote:

> On Mon, 14 Jun 2010 09:05:20 +1000
> Neil Brown <neilb@suse.de> wrote:
> 
> > > While I have done this myself a couple of times, I still do not understand 
> > > where it takes the disk space for the bitmap journal from? Is this space
> > > mdadm reserved for this purpose?
> > 
> > Sort-of.
> > It uses space that the alignment requirements of the metadata assure us is
> > otherwise unused.
> > For v0.90, that is limited to 60K.  For 1.x it is 3K.
> 
> I have now:
> 
> md0 : active raid5 sdf3[3] sde3[1] sda3[0]
>       3887004672 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>       bitmap: 1/8 pages [4KB], 131072KB chunk
> 
> Metadata is 1.2, and the internal bitmap is 8 pages, which is 32K, not 3K.
> Did I misunderstand something, or perhaps 3K was a typo?
> 

The pages used to store the bitmap internally use 16 bits per bitmap-chunk,
to count how many active IO requests to the chunk there are.  So it is
potentially 16 times the size of the bitmap stored on disk.  For that reason
we free pages for which all chunks are idle.  In your case, only one of the 8
pages currently has any active chunks.

There are 3887004672 / 131072 or about 29655 chunks.  Hence bits.
29655/8 is 3706 bytes which you will notice is still larger than 3 K.

When you create an array an specify that a bitmap be added at the same time,
there is more flexibility for size and location of the bitmap.  It can easily
be more that 3K in that case.

So presumably this array was created with a bitmap, rather than created
without a bitmap and had a bitmap added later with --grow.  Correct?

NeilBrown

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-14  9:47                     ` Neil Brown
@ 2010-06-14 11:53                       ` Roman Mamedov
  2010-06-14 21:24                         ` Neil Brown
  0 siblings, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2010-06-14 11:53 UTC (permalink / raw)
  To: Neil Brown; +Cc: Bernd Schubert, linux-raid

[-- Attachment #1: Type: text/plain, Size: 988 bytes --]

On Mon, 14 Jun 2010 19:47:42 +1000
Neil Brown <neilb@suse.de> wrote:

> When you create an array an specify that a bitmap be added at the same time,
> there is more flexibility for size and location of the bitmap.  It can easily
> be more that 3K in that case.
> 
> So presumably this array was created with a bitmap, rather than created
> without a bitmap and had a bitmap added later with --grow.  Correct?

Yes, as far as I remember. However, it seems a bit unfortunate to have any
significant difference between adding the bitmap right when creating the
array, and adding it later. For that reason, I'd suggest reserving more space
in the metadata than 3K, even if the bitmap isn't requested - and even if it
won't be added later, then that space could prove useful for something else
that might require it later. Maybe 64, 128 or 256K - still miniscule compared
to the array size, and could provide some nice flexibility for the future.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RAID6 and crashes (reporting back re. --bitmap)
  2010-06-14 11:53                       ` Roman Mamedov
@ 2010-06-14 21:24                         ` Neil Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Neil Brown @ 2010-06-14 21:24 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Bernd Schubert, linux-raid

On Mon, 14 Jun 2010 17:53:27 +0600
Roman Mamedov <roman@rm.pp.ru> wrote:

> On Mon, 14 Jun 2010 19:47:42 +1000
> Neil Brown <neilb@suse.de> wrote:
> 
> > When you create an array an specify that a bitmap be added at the same time,
> > there is more flexibility for size and location of the bitmap.  It can easily
> > be more that 3K in that case.
> > 
> > So presumably this array was created with a bitmap, rather than created
> > without a bitmap and had a bitmap added later with --grow.  Correct?
> 
> Yes, as far as I remember. However, it seems a bit unfortunate to have any
> significant difference between adding the bitmap right when creating the
> array, and adding it later. For that reason, I'd suggest reserving more space
> in the metadata than 3K, even if the bitmap isn't requested - and even if it
> won't be added later, then that space could prove useful for something else
> that might require it later. Maybe 64, 128 or 256K - still miniscule compared
> to the array size, and could provide some nice flexibility for the future.
> 

Yes. An I'm fairly sure mdadm does always reserve space.
The point is that (until very recently) it wasn't possible to tell the kernel
where to add a bitmap to an active array - just that it should add one.
So it could  only add it at a place that it was certain would be usable.
That is the place I described.
It is now possible to give the kernel more details of the bitmap to add.  I
just need to teach mdadm how to choose the best space and how to tell the
kernel about it.

NeilBrown

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-06-14 21:24 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-10 18:02 RAID6 and crashes Miles Fidelman
2010-06-10 18:57 ` Roman Mamedov
2010-06-10 21:22   ` RAID6 and crashes (reporting back re. --bitmap) Miles Fidelman
2010-06-10 21:41     ` Roman Mamedov
2010-06-10 22:40       ` Miles Fidelman
2010-06-11  2:51         ` Roman Mamedov
2010-06-11  4:31           ` Graham Mitchell
2010-06-11  4:41             ` Roman Mamedov
2010-06-11 12:13               ` Graham Mitchell
2010-06-11  4:42             ` Miles Fidelman
2010-06-11  4:50             ` Neil Brown
2010-06-13 14:28               ` Bernd Schubert
2010-06-13 23:05                 ` Neil Brown
2010-06-14  9:01                   ` Bernd Schubert
2010-06-14  9:14                   ` Roman Mamedov
2010-06-14  9:47                     ` Neil Brown
2010-06-14 11:53                       ` Roman Mamedov
2010-06-14 21:24                         ` Neil Brown
2010-06-11  4:46           ` Miles Fidelman
2010-06-11  4:55             ` Roman Mamedov
2010-06-11 20:26               ` Miles Fidelman
2010-06-11  5:08             ` Neil Brown
2010-06-11 11:10               ` John Hendrikx
2010-06-11 11:50                 ` Roman Mamedov
2010-06-11 12:29                   ` Graham Mitchell
2010-06-11 12:25                 ` Graham Mitchell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.