All of lore.kernel.org
 help / color / mirror / Atom feed
* Reshape stuck immediately, backup file all nulls
@ 2016-01-30 12:21 Björn Augustsson
  2016-01-30 17:09 ` Phil Turmel
  2016-01-30 18:13 ` Mikael Abrahamsson
  0 siblings, 2 replies; 7+ messages in thread
From: Björn Augustsson @ 2016-01-30 12:21 UTC (permalink / raw)
  To: linux-raid

Folks,

I wanted to add another disk to a RAID6 array I have. So I ran
$ mdadm --add /dev/md127 /dev/sdj1
$ mdadm --grow --raid-devices=8 --backup-file=/boot/grow_md127.bak  /dev/md127

This appeared to work right, but looking at /proc/mdstat, it says

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : active raid6 sdd1[8] sdj[6] sdg[0] sdk[4] sdh[1] sdi[2] sdc[5] sda1[7]
      14650675200 blocks super 1.2 level 6, 512k chunk, algorithm 2
[8/8] [UUUUUUUU]
      [>....................]  reshape =  0.0% (1/2930135040)
finish=445893299483.7min speed=0K/sec

unused devices: <none>

That is, it's stuck. And it's been that way since (about 36h now)

Looking at some logs, I found this in messages:

Jan 28 20:24:27 ooo systemd: Created slice
system-mdadm\x2dgrow\x2dcontinue.slice.
Jan 28 20:24:27 ooo audit: SERVICE_START pid=1 uid=0 auid=4294967295
ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=mdadm-grow-continue@md127 comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
 res=success'
Jan 28 20:24:27 ooo systemd: Starting system-mdadm\x2dgrow\x2dcontinue.slice.
Jan 28 20:24:28 ooo audit: AVC avc:  denied  { write } for  pid=11103
comm="mdadm" name="grow_md127.bak" dev="sdf1" ino=426
scontext=system_u:system_r:mdadm_t:s0
tcontext=unconfined_u:object_r:boot_t:s0 tclass=file permissive=0
Jan 28 20:24:28 ooo audit: SYSCALL arch=c000003e syscall=2 success=no
exit=-13 a0=ec1fc0 a1=242 a2=180 a3=7800 items=0 ppid=1 pid=11103
auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
fsgid=0 tty=(none) ses=4294
967295 comm="mdadm" exe="/usr/sbin/mdadm"
subj=system_u:system_r:mdadm_t:s0 key=(null)
Jan 28 20:24:28 ooo systemd: mdadm-grow-continue@md127.service: Main
process exited, code=exited, status=1/FAILURE
Jan 28 20:24:28 ooo systemd: mdadm-grow-continue@md127.service: Unit
entered failed state.
Jan 28 20:24:28 ooo audit: SERVICE_STOP pid=1 uid=0 auid=4294967295
ses=4294967295 subj=system_u:system_r:init_t:s0
msg='unit=mdadm-grow-continue@md127 comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=failed'
Jan 28 20:24:28 ooo systemd: mdadm-grow-continue@md127.service: Failed
with result 'exit-code'.
Jan 28 20:24:32 ooo setroubleshoot: SELinux is preventing
/usr/sbin/mdadm from write access on the file grow_md127.bak. For
complete SELinux messages. run sealert -l
43815f80-8b00-40d9-86a3-4a6a432f3e05
Jan 28 20:24:32 ooo python3: SELinux is preventing /usr/sbin/mdadm
from write access on the file grow_md127.bak.#012#012*****  Plugin
kernel_modules (91.4 confidence) suggests
********************#012#012If you do not think m
dadm should try write access on grow_md127.bak.#012Then you may be
under attack by a hacker, since confined applications should not need
this access.#012Do#012contact your security administrator and report
this issue.#012#012**
***  Plugin catchall (9.59 confidence) suggests
**************************#012#012If you believe that mdadm should be
allowed write access on the grow_md127.bak file by default.#012Then
you should report this as a bug.#012You
 can generate a local policy module to allow this
access.#012Do#012allow this access for now by executing:#012# grep
mdadm /var/log/audit/audit.log | audit2allow -M mypol#012# semodule -i
mypol.pp#012

So it seems selinux is preventing writes to the backup file I specified.
(I put it in /boot, since that's the only file system I have that's
not on the array.)
Interestingly, the file exists

$ ls -l /boot/grow_md127.bak
-rw-------. 1 root root 15732736 Jan 28 20:24 /boot/grow_md127.bak
$
but it's all nulls (as in the case at
http://www.spinics.net/lists/raid/msg40771.html )

The question is, what kind of state am in now? And how should I recover?
Will just adding a policy to allow access to that file, and then
mdadm --grow --continue /dev/md127
fix it? Is the broken backup file going to be a problem?

The system is an uptodate Fedora 23, x86_64, with kernel
4.3.3-303.fc23.x86_64 and mdadm-3.3.4-2.fc23.x86_64.

Thanks,

/August.
-- 
Wrong on most accounts.  const Foo *foo; and Foo const *foo; mean the same: foo
being a pointer to const Foo.  const Foo const *foo; would mean the same but is
illegal (double const).  You are confusing this with Foo * const foo; and const
Foo * const foo; respectively. -David Kastrup, comp.os.linux.development.system

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape stuck immediately, backup file all nulls
  2016-01-30 12:21 Reshape stuck immediately, backup file all nulls Björn Augustsson
@ 2016-01-30 17:09 ` Phil Turmel
  2016-01-31 14:59   ` Björn Augustsson
  2016-02-11  4:22   ` NeilBrown
  2016-01-30 18:13 ` Mikael Abrahamsson
  1 sibling, 2 replies; 7+ messages in thread
From: Phil Turmel @ 2016-01-30 17:09 UTC (permalink / raw)
  To: Björn Augustsson, linux-raid

Hi Björn,

On 01/30/2016 07:21 AM, Björn Augustsson wrote:

> The question is, what kind of state am in now? 

The kernel is waiting for mdmon (mdadm as a background task) to step
through the stripes.  mdmon died and the kernel will wait forever.

> And how should I recover?
> Will just adding a policy to allow access to that file, and then
> mdadm --grow --continue /dev/md127
> fix it? 

Probably.  Others have fixed this by disabling selinux for the reshape.
 Don't forget to specify --backup-file again on the command line.  (I
don't use selinux myself, so I can't be more specific.)

In the future, consider not using a backup file at all -- mdadm
generally leaves enough dead space on devices to avoid the need.

> Is the broken backup file going to be a problem?

Shouldn't be. Report back if it is.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape stuck immediately, backup file all nulls
  2016-01-30 12:21 Reshape stuck immediately, backup file all nulls Björn Augustsson
  2016-01-30 17:09 ` Phil Turmel
@ 2016-01-30 18:13 ` Mikael Abrahamsson
  1 sibling, 0 replies; 7+ messages in thread
From: Mikael Abrahamsson @ 2016-01-30 18:13 UTC (permalink / raw)
  To: Björn Augustsson; +Cc: linux-raid

[-- Attachment #1: Type: TEXT/PLAIN, Size: 280 bytes --]

On Sat, 30 Jan 2016, Björn Augustsson wrote:

> That is, it's stuck. And it's been that way since (about 36h now)

Look in the list archives. Some people have had luck with issuing 
--continue to the array to get it going again.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape stuck immediately, backup file all nulls
  2016-01-30 17:09 ` Phil Turmel
@ 2016-01-31 14:59   ` Björn Augustsson
  2016-02-01  1:02     ` George Rapp
  2016-02-11  4:22   ` NeilBrown
  1 sibling, 1 reply; 7+ messages in thread
From: Björn Augustsson @ 2016-01-31 14:59 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On Sat, Jan 30, 2016 at 6:09 PM, Phil Turmel <philip@turmel.org> wrote:
> Hi Björn,
>
> On 01/30/2016 07:21 AM, Björn Augustsson wrote:
>
>> The question is, what kind of state am in now?
>
> The kernel is waiting for mdmon (mdadm as a background task) to step
> through the stripes.  mdmon died and the kernel will wait forever.

OK. So I think the root cause here is that my invocation of mdadm was
unconfined,
because I was running it interactively. So it could create the backup file.
But the background version starts via systemd, and runs as mdadm_t, which
can't read that file (because it's in /boot). And poof.

I'll file a bug vs the fedora selinux policy for this, too.

>> And how should I recover?
>> Will just adding a policy to allow access to that file, and then
>> mdadm --grow --continue /dev/md127
>> fix it?
>
> Probably.  Others have fixed this by disabling selinux for the reshape.
>  Don't forget to specify --backup-file again on the command line.  (I
> don't use selinux myself, so I can't be more specific.)

Yeah, this seems to be working.
mdadm --grow --continue --backup-file=/boot/grow_md127.bak /dev/md127
is running, and the counters in /proc/mdstat started moving. It's going to be
a day or two for this to complete, but right now things are looking good.

Thanks for the help!

/August.

> In the future, consider not using a backup file at all -- mdadm
> generally leaves enough dead space on devices to avoid the need.
>
>> Is the broken backup file going to be a problem?
>
> Shouldn't be. Report back if it is.
>
> Phil



-- 
Wrong on most accounts.  const Foo *foo; and Foo const *foo; mean the same: foo
being a pointer to const Foo.  const Foo const *foo; would mean the same but is
illegal (double const).  You are confusing this with Foo * const foo; and const
Foo * const foo; respectively. -David Kastrup, comp.os.linux.development.system
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape stuck immediately, backup file all nulls
  2016-01-31 14:59   ` Björn Augustsson
@ 2016-02-01  1:02     ` George Rapp
  2016-02-01 18:45       ` Björn Augustsson
  0 siblings, 1 reply; 7+ messages in thread
From: George Rapp @ 2016-02-01  1:02 UTC (permalink / raw)
  To: Björn Augustsson; +Cc: Phil Turmel, Linux-RAID

On Sun, Jan 31, 2016 at 9:59 AM, Björn Augustsson <oggust@gmail.com> wrote:
>
> On Sat, Jan 30, 2016 at 6:09 PM, Phil Turmel <philip@turmel.org> wrote:
> > Hi Björn,
> >
> > The kernel is waiting for mdmon (mdadm as a background task) to step
> > through the stripes.  mdmon died and the kernel will wait forever.
>
> OK. So I think the root cause here is that my invocation of mdadm was
> unconfined,
> because I was running it interactively. So it could create the backup file.
> But the background version starts via systemd, and runs as mdadm_t, which
> can't read that file (because it's in /boot). And poof.
>
> I'll file a bug vs the fedora selinux policy for this, too.


Thanks for researching the SELinux vs. mdadm conflict in Fedora that
bit me as well (http://marc.info/?l=linux-raid&m=145349072305613&w=2).
If your Fedora selinux bug ticket makes it into general distribution,
lots of other users will be helped.

George
-- 
George Rapp  (Pataskala, OH) Home: george.rapp -- at -- gmail.com
LinkedIn profile: https://www.linkedin.com/in/georgerapp
Phone: +1 740 936 RAPP (740 936 7277)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape stuck immediately, backup file all nulls
  2016-02-01  1:02     ` George Rapp
@ 2016-02-01 18:45       ` Björn Augustsson
  0 siblings, 0 replies; 7+ messages in thread
From: Björn Augustsson @ 2016-02-01 18:45 UTC (permalink / raw)
  To: George Rapp; +Cc: Phil Turmel, Linux-RAID

On Mon, Feb 1, 2016 at 2:02 AM, George Rapp <george.rapp@gmail.com> wrote:
> On Sun, Jan 31, 2016 at 9:59 AM, Björn Augustsson <oggust@gmail.com> wrote:
>>
>> On Sat, Jan 30, 2016 at 6:09 PM, Phil Turmel <philip@turmel.org> wrote:
>> > Hi Björn,
>> >
>> > The kernel is waiting for mdmon (mdadm as a background task) to step
>> > through the stripes.  mdmon died and the kernel will wait forever.
>>
>> OK. So I think the root cause here is that my invocation of mdadm was
>> unconfined,
>> because I was running it interactively. So it could create the backup file.
>> But the background version starts via systemd, and runs as mdadm_t, which
>> can't read that file (because it's in /boot). And poof.
>>
>> I'll file a bug vs the fedora selinux policy for this, too.
>
>
> Thanks for researching the SELinux vs. mdadm conflict in Fedora that
> bit me as well (http://marc.info/?l=linux-raid&m=145349072305613&w=2).
> If your Fedora selinux bug ticket makes it into general distribution,
> lots of other users will be helped.

marc.info appears to be down, at least for me.
Searching some more I found this though, and I guess it's it - it
sounds like exactly the same problem:
http://www.spinics.net/lists/raid/msg50735.html

I filed the bug with Fedora, at
https://bugzilla.redhat.com/show_bug.cgi?id=1303650

Let's hope this gets resolved. Storage/RAID issues like this make me nervous.

/August.

> George
> --
> George Rapp  (Pataskala, OH) Home: george.rapp -- at -- gmail.com
> LinkedIn profile: https://www.linkedin.com/in/georgerapp
> Phone: +1 740 936 RAPP (740 936 7277)



-- 
Wrong on most accounts.  const Foo *foo; and Foo const *foo; mean the same: foo
being a pointer to const Foo.  const Foo const *foo; would mean the same but is
illegal (double const).  You are confusing this with Foo * const foo; and const
Foo * const foo; respectively. -David Kastrup, comp.os.linux.development.system
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Reshape stuck immediately, backup file all nulls
  2016-01-30 17:09 ` Phil Turmel
  2016-01-31 14:59   ` Björn Augustsson
@ 2016-02-11  4:22   ` NeilBrown
  1 sibling, 0 replies; 7+ messages in thread
From: NeilBrown @ 2016-02-11  4:22 UTC (permalink / raw)
  To: Phil Turmel, Björn Augustsson, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

On Sun, Jan 31 2016, Phil Turmel wrote:

> Hi Björn,
>
> On 01/30/2016 07:21 AM, Björn Augustsson wrote:
>
>> The question is, what kind of state am in now? 
>
> The kernel is waiting for mdmon (mdadm as a background task) to step
> through the stripes.  mdmon died and the kernel will wait forever.

Close, not quite.

'mdmon' is a background task that mdadm *only* uses for externally
managed metadata: IMSM and DDF.

For a reshape like this mdadm needs a background "mdadm" task.  It used
to just fork, but in these enlightened days it asks systemd to run that
task.
As has already been observed, that failed due to selinux not
understanding.

So it was an 'mdadm' which exited, rather than an mdmon which died...

Maybe you already worked that out.

The days of backup files should be numbered.  New kernels and new mdadm
adjust the data-offset so no back is needed.  In that case, no
background mdadm is needed either.

NeilBrown


>
>> And how should I recover?
>> Will just adding a policy to allow access to that file, and then
>> mdadm --grow --continue /dev/md127
>> fix it? 
>
> Probably.  Others have fixed this by disabling selinux for the reshape.
>  Don't forget to specify --backup-file again on the command line.  (I
> don't use selinux myself, so I can't be more specific.)
>
> In the future, consider not using a backup file at all -- mdadm
> generally leaves enough dead space on devices to avoid the need.
>
>> Is the broken backup file going to be a problem?
>
> Shouldn't be. Report back if it is.
>
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-11  4:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-30 12:21 Reshape stuck immediately, backup file all nulls Björn Augustsson
2016-01-30 17:09 ` Phil Turmel
2016-01-31 14:59   ` Björn Augustsson
2016-02-01  1:02     ` George Rapp
2016-02-01 18:45       ` Björn Augustsson
2016-02-11  4:22   ` NeilBrown
2016-01-30 18:13 ` Mikael Abrahamsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.