All of lore.kernel.org
 help / color / mirror / Atom feed
From: Manuel Riel <manu@snapdragon.cc>
To: Song Liu <song@kernel.org>
Cc: Linux-RAID <linux-raid@vger.kernel.org>,
	Vojtech Myslivec <vojtech@xmyslivec.cz>
Subject: Re: [PATCH] md: warn about using another MD array as write journal
Date: Sat, 20 Mar 2021 09:12:08 +0800	[thread overview]
Message-ID: <27EE5CBC-B1B8-4463-87F5-2AE73F30941B@snapdragon.cc> (raw)
In-Reply-To: <CAPhsuW4=XoyQV_HNVnFnMWS2PvvU1+Rtbh9SJB-FQTO3haa3ig@mail.gmail.com>

On Mar 20, 2021, at 7:16 AM, Song Liu <song@kernel.org> wrote:
> 
> Sorry for being late on this issue.
> 
> Manuel and Vojtech, are we confident that this issue only happens when we use
> another md array as the journal device?
> 
> Thanks,
> Song

Hi Song,

thanks for getting back.

Unfortunately it's still happening, even when using a NVMe partition directly. It just took a long 3 weeks to happen. So discard my patch. Here how it went down yesterday:

- process md4_raid6 is running with 100% CPU utilization, all I/O to the array is blocked
- no disk activity on the physical drives
- soft reboot doesn't work, as md4_raid6 blocks, so hard reset is needed
- when booting to rescue mode, it tries to assemble the array and shows the same issue of 100% CPU utilization. Also can't reboot.
- when manually assembling it *with* the journal drive, it will read a few GB from the journal device and then get stuck at 100% CPU utilization again without any disk activity.

Solution in the end was to avoid assembling the array on reboot, then assemble it *without* the existing journal and add an empty journal drive later. This lead to some data loss and a full resync.

I'm currently moving all data off this machine and will repave it. Then see if that changes anything.

My main OS is CentOS 8 and the rescue system was Debian. Both showed a similar issue. This must be connected to the journal drive somehow.

My journal drive is a partition on an NVMe with ~180GB in size.

Thanks for any pointers, I could try next.

Manu

  reply	other threads:[~2021-03-20  1:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17  4:37 [PATCH] md: warn about using another MD array as write journal Manuel Riel
2021-03-19 23:16 ` Song Liu
2021-03-20  1:12   ` Manuel Riel [this message]
2021-03-21  4:22     ` Manuel Riel
2021-03-22 17:13       ` Song Liu
2021-03-23  3:27         ` Manuel Riel
2021-05-12 22:39         ` Vojtech Myslivec
2021-05-13  1:19           ` Guoqing Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27EE5CBC-B1B8-4463-87F5-2AE73F30941B@snapdragon.cc \
    --to=manu@snapdragon.cc \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=vojtech@xmyslivec.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.