All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel <pavel2000@areainter.net>
To: Wol <antlists@youngman.org.uk>
Cc: linux-raid@vger.kernel.org
Subject: Re: Misbehavior of md-raid RAID on failed NVMe.
Date: Thu, 9 Jun 2022 01:16:06 +0700	[thread overview]
Message-ID: <deacdcb9-d100-877a-40b4-42952731806c@areainter.net> (raw)
In-Reply-To: <b14b62c9-1494-935f-f9f0-43f8083e8547@youngman.org.uk>

08.06.2022 23:52, Wol пишет:
> On 08/06/2022 04:48, Pavel wrote:
>
> Did you dd the raid device (/dev/md0 for example), or the individual 
> nvme devices?

There was LVM over /dev/md0, and dd transferred LVM volumes data.

>> While data in transfer, kernel started IO errors reporting on one of 
>> NVMe devices. (dmesg output below)
>> But md-raid not reacted on them in any way. RAID array not went into 
>> any failed state, and "clean" state reported all the time.
>
> This is actually normal, correct and expected behaviour. If the raid 
> layer does not report a problem to dd, the data should have copied 
> correctly. And raid really only reports problems if it gets write 
> failures.

Yes, but data was not copied correctly.

> Unfortunately, you're missing a lot of detail to help us diagnose the 
> problem. What raid level are you using, for starters. It sounds like 
> there is a problem, but as Mariusz implies, it looks like a faulty 
> nVME device. And if that device is lying to linux, as appears likely 
> (my guess is that raid is trying to fix the data, and the drive is 
> just losing the writes),

Feel free to ask. Raid level: RAID 1, built over partitions on two NVMe 
devices.

Yes, drive is "just" losing the writes. But there is nothing "to fix" on 
RAID level.
 From my user POV, RAID should detect the loss and take appropriate 
actions (mark device as failed).

I don't know, if NVMe layer lies to kernel or not, but I clearly see
"I/O error, dev nvme0n1, sector 1297536456 op 0x1:(WRITE) flags 0x0 
phys_seg 1 prio class 0"
messages, and I expect they clearly mean write failure.

> then there is precious little we can do about it.

As a kernel user, I did all I might to do - posted an report here.
As a kernel developers, you can do a bit more, than users.

Thanks for your answers.

Regards,
Pavel.


  reply	other threads:[~2022-06-08 18:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-08  3:48 Misbehavior of md-raid RAID on failed NVMe Pavel
2022-06-08  8:32 ` Mariusz Tkaczyk
     [not found]   ` <8b0c4bf1-a165-95ca-9746-8ef7be46092e@areainter.net>
2022-06-08  9:11     ` Mariusz Tkaczyk
2022-06-08 16:52 ` Wol
2022-06-08 18:16   ` Pavel [this message]
     [not found]   ` <CAAMCDef5jamJa+um=DSM08CPdzoTvEQuFOdrGo7jiNivrNVbpg@mail.gmail.com>
2022-06-09  6:43     ` Pavel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=deacdcb9-d100-877a-40b4-42952731806c@areainter.net \
    --to=pavel2000@areainter.net \
    --cc=antlists@youngman.org.uk \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.