linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vojtech Myslivec <vojtech@xmyslivec.cz>
To: Chris Murphy <lists@colorremedies.com>
Cc: Song Liu <songliubraving@fb.com>,
	Michal Moravec <michal.moravec@logicworks.cz>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	Linux-RAID <linux-raid@vger.kernel.org>
Subject: Re: Linux RAID with btrfs stuck and consume 100 % CPU
Date: Wed, 16 Sep 2020 11:42:18 +0200	[thread overview]
Message-ID: <5e79d1f8-7632-48ef-de56-9e79cba87434@xmyslivec.cz> (raw)
In-Reply-To: <CAJCQCtQWh2JBAL_SDRG-gMd9Z1TXad7aKjZVUGdY1Akj7fn5Qg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]

Hello,

it seems my last e-mail was filtered as I can't find it in the archives.
So I will resend it and include all attachments in one tarball.


On 26. 08. 20 20:07, Chris Murphy wrote:> OK so from the attachments..
>
> cat /proc/<pid>/stack for md1_raid6
>
> [<0>] rq_qos_wait+0xfa/0x170
> [<0>] wbt_wait+0x98/0xe0
> [<0>] __rq_qos_throttle+0x23/0x30
> [<0>] blk_mq_make_request+0x12a/0x5d0
> [<0>] generic_make_request+0xcf/0x310
> [<0>] submit_bio+0x42/0x1c0
> [<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
> [<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456]
> [<0>] md_thread+0x94/0x150 [md_mod]
> [<0>] kthread+0x112/0x130
> [<0>] ret_from_fork+0x22/0x40
>
>
> Btrfs snapshot flushing might instigate the problem but it seems to me
> there's some kind of contention or blocking happening within md, and
> that's why everything stalls. But I can't tell why.
>
> Do you have any iostat output at the time of this problem? I'm
> wondering if md is waiting on disks. If not, try `iostat -dxm 5` and
> share a few minutes before and after the freeze/hang.
We have detected the issue at Monday 31.09.2020 15:24. It must happen
sometimes between 15:22-15:24 as we monitor the state every 2 minutes.

We have recorded stacks of blocked processes, sysrq+w command and
requested `iostat`. Then in 15:45, we perform manual "unstuck" process
by accessing md1 device via dd command (reading a few random blocks).

I hope attached file names are self-explaining.

Please let me know if we can do anything more to track the issue or if I
forget something.

Thanks a lot,
Vojtech and Michal



Description of the devices in iostat, just for recap:
- sda-sdf: 6 HDD disks
- sdg, sdh: 2 SSD disks

- md0: raid1 over sdg1 and sdh1 ("SSD RAID", Physical Volume for LVM)
- md1: our "problematic" raid6 over sda-sdf ("HDD RAID", btrfs
       formatted)

- Logical volumes over md0 Physical Volume (on SSD RAID)
    - dm-0: 4G  LV for SWAP
    - dm-1: 16G LV for root file system (ext4 formatted)
    - dm-2: 1G  LV for md1 journal


[-- Attachment #2: mdraid-btrfs-issue.tgz --]
[-- Type: application/x-compressed-tar, Size: 43544 bytes --]

  reply	other threads:[~2020-09-16  9:42 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec
2020-07-22 22:00 ` antlists
2020-07-23  2:08 ` Chris Murphy
     [not found]   ` <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz>
2020-07-28 20:23     ` Chris Murphy
     [not found]       ` <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz>
2020-08-14 20:04         ` Chris Murphy
     [not found]           ` <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz>
2020-08-19 22:58             ` Chris Murphy
2020-08-19 23:11               ` Peter Grandi
2020-08-26 15:35               ` Vojtech Myslivec
2020-08-26 18:07                 ` Chris Murphy
2020-09-16  9:42                   ` Vojtech Myslivec [this message]
2020-09-17 17:08                     ` Chris Murphy
2020-09-17 17:20                       ` Chris Murphy
2020-09-17 17:43                     ` Chris Murphy
2020-09-23 18:14                       ` Vojtech Myslivec
     [not found]                         ` <DBB07C8C-0D83-47DC-9B91-78AD385775E3@snapdragon.cc>
     [not found]                           ` <D3026A55-A7F2-4432-87A8-3E9B2CACE4C2@snapdragon.cc>
     [not found]                             ` <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc>
2021-03-17 15:55                               ` Vojtech Myslivec
2020-07-29 21:06 ` Guoqing Jiang
2020-07-29 21:48   ` Chris Murphy
2020-08-12 14:19     ` Vojtech Myslivec
2020-07-30  6:45   ` Song Liu
2020-08-12 13:58   ` Vojtech Myslivec

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e79d1f8-7632-48ef-de56-9e79cba87434@xmyslivec.cz \
    --to=vojtech@xmyslivec.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=michal.moravec@logicworks.cz \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).