All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID10-reshape crashed due to memory allocation problem
@ 2014-08-08 16:10 Peter Koch
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Koch @ 2014-08-08 16:10 UTC (permalink / raw)
  To: linux-raid

Dear Neil,

thanks for your answers and the great job you are doing for all of us
mdraid-users.

> > Dear readers,
> >=20
> > seems like memory is leaking when a RAID10-array is reshaped.
>
> Only in Linux 3.14.
> You need commit cc13b1d1500656a20e41960668f3392dda9fa6e2
> which is in v3.14.16
>
> So even though you didn't tell me what kernel version you were using, I
> knew:-)
>
> (always report mdadm version and kernel version).

And of course you are right. In order to reshape our raid10-array
I upgraded to mdadm 3.3 and the newest longterm kernel that was
available. That was a couple of weeks ago and the newest 3.14.x
kernel was 3.14.12 :-(

Kind regards

Peter Koch

^ permalink raw reply	[flat|nested] 5+ messages in thread
* RAID10-reshape crashed due to memory allocation problem
@ 2014-08-08 16:46 Peter Koch
  2014-08-09  0:37 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Koch @ 2014-08-08 16:46 UTC (permalink / raw)
  To: linux-raid

Dear Neil,

thanks for the info about set meaning of set-A, the difference
between Device-ID and RaidDevice.

> > Number   Major   Minor   RaidDevice State
> >    0       8       16        0      active sync set-A   /dev/sdb
> >    1       8      160        1      active sync set-B   /dev/sdk
> >    2       8       32        2      active sync set-A   /dev/sdc
> >    3       8      176        3      active sync set-B   /dev/sdl
> >    4       8       48        4      active sync set-A   /dev/sdd
> >    5       8      192        5      active sync set-B   /dev/sdm
> >    6       8       64        6      active sync set-A   /dev/sde
> >    7       8      208        7      active sync set-B   /dev/sdn
> >    8       8       80        8      active sync set-A   /dev/sdf
> >    9       8      224        9      active sync set-B   /dev/sdo
> >   10       8       96       10      active sync set-A   /dev/sdg
> >   11       8      240       11      active sync set-B   /dev/sdp
> >   12       8      112       12      active sync set-A   /dev/sdh
> >   14       8      128       13      active sync set-B   /dev/sdi
> >   13      65        0       14      active sync set-A   /dev/sdq
> >   15      65       16       15      active sync set-B   /dev/sdr
> 
> It's not "sync set-A", it is "active" and "sync" and  "set-A".
> When you have a RAID10 that can be seen as two sets of devices where one set
> is mirrored to the other, they are labels as "set-A" and  "set-B", just like
> you assumed.

The first time I noticed "set-A" and "set-B" in mdadm -D output
was after the reshape operation started. If I understand you
correct the real reason for these information to appear was not
the reshape operation but growing the array from an odd number
of drives (13) to an even number (16)

> > What's the difference between column 1 and column 4 in
> > mdadm -D output?
>
> column 1 identified the device.  column 4 give the role that the device plays
> in the array.
> This seemed to make sense once, but it could be more confusing than helpful.

What irritates me most is the fact that /proc/mdstat output
lists the device id. For a raid10-array with near-2 layout and
an even number of drives I was under the impression that the data
is mirrored between all drives with even and all drives with odd
numbers. In my case this is wrong: drives 0,2,4,6,8,10,12,13
are mirrored with drives 1,3,5,7,9,11,14,15.

Spare-Drives are not shown with their number but with their role
within the array (namely (S) ). So why not printing the role
for all drives instead of their device-id. I would also suggest
adding the set to mdstat-output, i.e:

md5 : active raid10 sdb[A0] sdr[B7] sdq[A7] sdi[B6] sdh[A6] sdp[B5] sdg[A5] sdo[B4] sdf[A4] sdn[B3] sde[A3] sdm[B2] sdd[A2] sdl[B1] sdc[A1] sdk[B0]

Is there any way to change the device-id?

Kind regards

Peter Koch

^ permalink raw reply	[flat|nested] 5+ messages in thread
* RAID10-reshape crashed due to memory allocation problem
@ 2014-08-07 21:06 Peter Koch
  2014-08-08  2:35 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Koch @ 2014-08-07 21:06 UTC (permalink / raw)
  To: linux-raid

Dear readers,

seems like memory is leaking when a RAID10-array is reshaped.

Here are the details of what I did:

RAID10-array consisting of 13 disks (2TB each) and one spare was
grown into a 16 disk RAID10-array by adding two more spares and
then doing  mdadm --grow /dev/md5 --raid-devices=16

When the reshape operation reached 80% (after 20 hours) the system
became unresponsive and crashed soon after. Console output showed
something like "... could not allocate memory block ..."

The machine has 32GB of RAM.

After the machine was rebooted the reshape operation was running
for 6 more hours and was followed by an 8 hour resync.

Everything seems to be OK now, but according to /proc/meminfo
only 15GB of RAM are available. Much too low for a system that
is almost idle.

I will reboot the machine at our next maintenance window and
compare its available memory with the situation right now.

Is restarting the reshape operation after a crash really safe?
Should I check the correctness of my array somehow?

Kind regards

Peter Koch

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-08-09  0:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-08 16:10 RAID10-reshape crashed due to memory allocation problem Peter Koch
  -- strict thread matches above, loose matches on Subject: below --
2014-08-08 16:46 Peter Koch
2014-08-09  0:37 ` NeilBrown
2014-08-07 21:06 Peter Koch
2014-08-08  2:35 ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.