linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vojtech Myslivec <vojtech@xmyslivec.cz>
To: Chris Murphy <lists@colorremedies.com>,
	Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	Linux-RAID <linux-raid@vger.kernel.org>,
	Michal Moravec <michal.moravec@logicworks.cz>,
	Song Liu <songliubraving@fb.com>
Subject: Re: Linux RAID with btrfs stuck and consume 100 % CPU
Date: Wed, 12 Aug 2020 16:19:31 +0200	[thread overview]
Message-ID: <442d5127-11f0-80ca-5914-1a561bb2c292@xmyslivec.cz> (raw)
In-Reply-To: <CAJCQCtQAHr91wEwvFmh_-UB3Cd3UecSjjy6w7nOeqUktrn4UzQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2848 bytes --]



On 29. 07. 20 23:48, Chris Murphy wrote:
> On Wed, Jul 29, 2020 at 3:06 PM Guoqing Jiang
> <guoqing.jiang@cloud.ionos.com> wrote:
>> On 7/22/20 10:47 PM, Vojtech Myslivec wrote:
>>> 1. What should be the cause of this problem?
>>
>> Just a quick glance based on the stacks which you attached, I guess it
>> could be
>> a deadlock issue of raid5 cache super write.
>>
>> Maybe the commit 8e018c21da3f ("raid5-cache: fix a deadlock in superblock>> write") didn't fix the problem completely.  Cc Song.
> 
> That references discards, and it make me relook at mdadm -D which
> shows a journal device:
> 
>        0     253        2        -      journal   /dev/dm-2
> 
> Vojtech, can you confirm this device is an SSD? There are a couple
> SSDs that show up in the dmesg if I recall correctly.

I tried to explain this in my first post. It's logical volume in a
volume group over RAID 1 over 2 SSDs.

My colleague replied to with more details:

On 05. 08. 2020 Michal Moravec wrote:
>> On 29 Jul 2020, Chris Murphy wrote:
>> Vojtech, can you confirm this device is an SSD? There are a couple
>> SSDs that show up in the dmesg if I recall correctly.
>
> Yes. We have a pair (sdg, sdh) of INTEL D3-S4610 240 GB SSDs
> (SSDSC2KG240G8).
> We use them for OS and the raid6 journal.
> They are configured as raid md0 array with LVM on top of it.
> Logical volume vg0-journal_md1 (of 1G size) is used as journal device
> for md1 array (where are problem with proccess md1_raid6 consuming
> 100%
> CPU and blocking btrfs operation is happening)


>> What is the default discard hinting for this SSD when it's used as
>> a journal device for mdadm?
>
> What do you mean by discard hinting?
> We have a issue_discards = 1 configuration in /etc/lvm/lvm.conf


>> And what is the write behavior of the journal?
>
> That would be journal_mode set to write-through, right?


>> I'm not familiar with this feature at all, whether it's treated as a
>> raw block device for the journal or if the journal resides on a file
>> system.
>
> From lsblk output I see no filesystem on vg0-journal_md1. It looks
> like plain logical volume to me.

[my comment]: yes, it's LV block device, no filesystem here.


>> So I get kinda curious what might happen long term if this is a very
>> busy file system, very busy raid5/6 journal on this SSD, without any
>> discard hints?
>> Is it possible the SSD runs out of ready-to-write erase blocks, and
>> the firmware has become super slow doing erasure/garbage collection
>> on demand?
>> And the journal is now having a hard time flushing?
>
> What kind of information could we gather to verify/reject any of these
> ideas?


[my question]: Is LVM configuration (above) enough? Sadly, there are not
much information about RAID 6 journaling at kernel wiki. There are some
info in mdadm(8), but nothing about discards/trim operation.

[-- Attachment #2: lsblk-output.txt --]
[-- Type: text/plain, Size: 1076 bytes --]

NAME                  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdg                     8:96   1 223,6G  0 disk  
├─sdg1                  8:97   1  37,3G  0 part  
│ └─md0                 9:0    0  37,2G  0 raid1 
│   ├─vg0-swap        253:0    0   3,7G  0 lvm   [SWAP]
│   ├─vg0-root        253:1    0  14,9G  0 lvm   /
│   └─vg0-journal_md1 253:2    0     1G  0 lvm   
│     └─md1             9:1    0  29,1T  0 raid6 /mnt/data
├─sdg2                  8:98   1     1K  0 part  
└─sdg5                  8:101  1 186,3G  0 part  
sdh                     8:112  1 223,6G  0 disk  
├─sdh1                  8:113  1  37,3G  0 part  
│ └─md0                 9:0    0  37,2G  0 raid1 
│   ├─vg0-swap        253:0    0   3,7G  0 lvm   [SWAP]
│   ├─vg0-root        253:1    0  14,9G  0 lvm   /
│   └─vg0-journal_md1 253:2    0     1G  0 lvm   
│     └─md1             9:1    0  29,1T  0 raid6 /mnt/data
├─sdh2                  8:114  1     1K  0 part  
└─sdh5                  8:117  1 186,3G  0 part  

  reply	other threads:[~2020-08-12 14:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec
2020-07-22 22:00 ` antlists
2020-07-23  2:08 ` Chris Murphy
     [not found]   ` <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz>
2020-07-28 20:23     ` Chris Murphy
     [not found]       ` <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz>
2020-08-14 20:04         ` Chris Murphy
     [not found]           ` <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz>
2020-08-19 22:58             ` Chris Murphy
2020-08-26 15:35               ` Vojtech Myslivec
2020-08-26 18:07                 ` Chris Murphy
2020-09-16  9:42                   ` Vojtech Myslivec
2020-09-17 17:08                     ` Chris Murphy
2020-09-17 17:20                       ` Chris Murphy
2020-09-17 17:43                     ` Chris Murphy
2020-09-23 18:14                       ` Vojtech Myslivec
2021-02-11  3:14                         ` Manuel Riel
2021-02-28  8:34                           ` Manuel Riel
     [not found]                             ` <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc>
2021-03-17 15:55                               ` Vojtech Myslivec
2020-07-29 21:06 ` Guoqing Jiang
2020-07-29 21:48   ` Chris Murphy
2020-08-12 14:19     ` Vojtech Myslivec [this message]
2020-08-12 14:19       ` Vojtech Myslivec
2020-07-30  6:45   ` Song Liu
2020-08-12 13:58   ` Vojtech Myslivec

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=442d5127-11f0-80ca-5914-1a561bb2c292@xmyslivec.cz \
    --to=vojtech@xmyslivec.cz \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=michal.moravec@logicworks.cz \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).