linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Murphy <lists@colorremedies.com>
To: Vojtech Myslivec <vojtech@xmyslivec.cz>
Cc: Chris Murphy <lists@colorremedies.com>,
	Song Liu <songliubraving@fb.com>,
	Michal Moravec <michal.moravec@logicworks.cz>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	Linux-RAID <linux-raid@vger.kernel.org>
Subject: Re: Linux RAID with btrfs stuck and consume 100 % CPU
Date: Thu, 17 Sep 2020 11:08:13 -0600	[thread overview]
Message-ID: <CAJCQCtTYAg-uNpk2WYv0QDWH+prfnDN5oKyKmvTVHjARu_w0Kw@mail.gmail.com> (raw)
In-Reply-To: <5e79d1f8-7632-48ef-de56-9e79cba87434@xmyslivec.cz>

On Wed, Sep 16, 2020 at 3:42 AM Vojtech Myslivec <vojtech@xmyslivec.cz> wrote:
>
> Hello,
>
> it seems my last e-mail was filtered as I can't find it in the archives.
> So I will resend it and include all attachments in one tarball.
>
>
> On 26. 08. 20 20:07, Chris Murphy wrote:> OK so from the attachments..
> >
> > cat /proc/<pid>/stack for md1_raid6
> >
> > [<0>] rq_qos_wait+0xfa/0x170
> > [<0>] wbt_wait+0x98/0xe0
> > [<0>] __rq_qos_throttle+0x23/0x30
> > [<0>] blk_mq_make_request+0x12a/0x5d0
> > [<0>] generic_make_request+0xcf/0x310
> > [<0>] submit_bio+0x42/0x1c0
> > [<0>] md_update_sb.part.71+0x3c0/0x8f0 [md_mod]
> > [<0>] r5l_do_reclaim+0x32a/0x3b0 [raid456]
> > [<0>] md_thread+0x94/0x150 [md_mod]
> > [<0>] kthread+0x112/0x130
> > [<0>] ret_from_fork+0x22/0x40
> >
> >
> > Btrfs snapshot flushing might instigate the problem but it seems to me
> > there's some kind of contention or blocking happening within md, and
> > that's why everything stalls. But I can't tell why.
> >
> > Do you have any iostat output at the time of this problem? I'm
> > wondering if md is waiting on disks. If not, try `iostat -dxm 5` and
> > share a few minutes before and after the freeze/hang.
> We have detected the issue at Monday 31.09.2020 15:24. It must happen
> sometimes between 15:22-15:24 as we monitor the state every 2 minutes.
>
> We have recorded stacks of blocked processes, sysrq+w command and
> requested `iostat`. Then in 15:45, we perform manual "unstuck" process
> by accessing md1 device via dd command (reading a few random blocks).
>
> I hope attached file names are self-explaining.
>
> Please let me know if we can do anything more to track the issue or if I
> forget something.
>
> Thanks a lot,
> Vojtech and Michal
>
>
>
> Description of the devices in iostat, just for recap:
> - sda-sdf: 6 HDD disks
> - sdg, sdh: 2 SSD disks
>
> - md0: raid1 over sdg1 and sdh1 ("SSD RAID", Physical Volume for LVM)
> - md1: our "problematic" raid6 over sda-sdf ("HDD RAID", btrfs
>        formatted)
>
> - Logical volumes over md0 Physical Volume (on SSD RAID)
>     - dm-0: 4G  LV for SWAP
>     - dm-1: 16G LV for root file system (ext4 formatted)
>     - dm-2: 1G  LV for md1 journal
>

It's kindof a complicated setup. When this problem happens, can you
check swap pressure?

/sys/fs/cgroup/memory.stat

pgfault and maybe also pgmajfault - see if they're going up; or also
you can look at vmstat and see how heavy swap is being used at the
time. The thing is.

Because any heavy eviction means writes to dm-0->md0 raid1->sdg+sdh
SSDs, which are the same SSDs that you have the md1 raid6 mdadm
journal going to. So if you have any kind of swap pressure, it very
likely will stop the journal or at least substantially slow it down,
and now you get blocked tasks as the pressure builds more and more
because now you have a ton of dirty writes in Btrfs that can't make it
to disk.

If there is minimal swap usage, then this hypothesis is false and
something else is going on. I also don't have an explanation why your
work around works.



-- 
Chris Murphy

  reply	other threads:[~2020-09-17 17:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-22 20:47 Linux RAID with btrfs stuck and consume 100 % CPU Vojtech Myslivec
2020-07-22 22:00 ` antlists
2020-07-23  2:08 ` Chris Murphy
     [not found]   ` <29509e08-e373-b352-d696-fcb9f507a545@xmyslivec.cz>
2020-07-28 20:23     ` Chris Murphy
     [not found]       ` <695936b4-67a2-c862-9cb6-5545b4ab3c42@xmyslivec.cz>
2020-08-14 20:04         ` Chris Murphy
     [not found]           ` <2f2f1c21-c81b-55aa-6f77-e2d3f32d32cb@xmyslivec.cz>
2020-08-19 22:58             ` Chris Murphy
2020-08-19 23:11               ` Peter Grandi
2020-08-26 15:35               ` Vojtech Myslivec
2020-08-26 18:07                 ` Chris Murphy
2020-09-16  9:42                   ` Vojtech Myslivec
2020-09-17 17:08                     ` Chris Murphy [this message]
2020-09-17 17:20                       ` Chris Murphy
2020-09-17 17:43                     ` Chris Murphy
2020-09-23 18:14                       ` Vojtech Myslivec
     [not found]                         ` <DBB07C8C-0D83-47DC-9B91-78AD385775E3@snapdragon.cc>
     [not found]                           ` <D3026A55-A7F2-4432-87A8-3E9B2CACE4C2@snapdragon.cc>
     [not found]                             ` <56AD80D0-6853-4E3A-A94C-AD1477D3FDA4@snapdragon.cc>
2021-03-17 15:55                               ` Vojtech Myslivec
2020-07-29 21:06 ` Guoqing Jiang
2020-07-29 21:48   ` Chris Murphy
2020-08-12 14:19     ` Vojtech Myslivec
2020-07-30  6:45   ` Song Liu
2020-08-12 13:58   ` Vojtech Myslivec

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJCQCtTYAg-uNpk2WYv0QDWH+prfnDN5oKyKmvTVHjARu_w0Kw@mail.gmail.com \
    --to=lists@colorremedies.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=michal.moravec@logicworks.cz \
    --cc=songliubraving@fb.com \
    --cc=vojtech@xmyslivec.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).