From: Filipe Manana <fdmanana@gmail.com>
To: Drazen Kacar <drazen.kacar@oradian.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Deadlock between btrfs-transacti and userland on 5.2.1
Date: Wed, 11 Sep 2019 15:59:05 +0100 [thread overview]
Message-ID: <CAL3q7H4sB=usrcGwwHa3jkVjsWBnNxNjLYaYJSo2tMsY0t6zRQ@mail.gmail.com> (raw)
In-Reply-To: <DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com>
On Thu, Jul 18, 2019 at 1:25 PM Drazen Kacar <drazen.kacar@oradian.com> wrote:
>
> Hello,
>
> I think I've encountered a deadlock between btrfs-transacti and postgres
> process(es). This is system information (btrfs fi usage obtained after
> poweroff and boot):
>
> # cat /etc/redhat-release
> CentOS Linux release 7.6.1810 (Core)
>
> # uname -a
> Linux prod-dbsnap-01 5.2.1-1.el7.elrepo.x86_64 #1 SMP Sun Jul 14 08:15:04 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux
>
> # btrfs --version
> btrfs-progs v5.2
>
> # btrfs filesystem usage /data/pg_data
> Overall:
> Device size: 2.00TiB
> Device allocated: 345.03GiB
> Device unallocated: 1.66TiB
> Device missing: 0.00B
> Used: 338.07GiB
> Free (estimated): 1.67TiB (min: 854.27GiB)
> Data ratio: 1.00
> Metadata ratio: 2.00
> Global reserve: 512.00MiB (used: 0.00B)
>
> Data,RAID0: Size:332.00GiB, Used:329.22GiB
> /dev/sdb 83.00GiB
> /dev/sdc 83.00GiB
> /dev/sdd 83.00GiB
> /dev/sde 83.00GiB
>
> Metadata,RAID10: Size:6.50GiB, Used:4.42GiB
> /dev/sdb 3.25GiB
> /dev/sdc 3.25GiB
> /dev/sdd 3.25GiB
> /dev/sde 3.25GiB
>
> System,RAID10: Size:16.00MiB, Used:48.00KiB
> /dev/sdb 8.00MiB
> /dev/sdc 8.00MiB
> /dev/sdd 8.00MiB
> /dev/sde 8.00MiB
>
> Unallocated:
> /dev/sdb 425.74GiB
> /dev/sdc 425.74GiB
> /dev/sdd 425.74GiB
> /dev/sde 425.74GiB
>
> There were three btrfs subvolumes and on each one there was a Postgres
> database slave doing recovery (single threaded writing). But there was a
> lot of writing. And prior to starting Postgres slaves I was restoring base
> backup from the backup server, which was being done by a number of
> parallel rsync processes (12 at most, I think).
>
> The file system is mounted with:
>
> # grep btrfs /proc/mounts
> /dev/sdd /data/pg_data btrfs rw,noatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/ 0 0
>
> After several hours I got this in /var/log/messages:
>
> # grep 'INFO: task .*blocked for more than' messages
> Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 122 seconds.
> Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 122 seconds.
> Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 122 seconds.
> Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 122 seconds.
> Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 122 seconds.
> Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 245 seconds.
> Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 245 seconds.
> Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 245 seconds.
> Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 245 seconds.
> Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 245 seconds.
>
> Full log is in the attachment. It seems to me that there is a deadlock
> between btrfs-transacti and any process which is trying to write
> something (not sure about reading). While this was going on the cpu
> usage (according to top) was non-existant. There are 4 cpus (it's a virtual
> machine in VmWare) and 3 were 100% idle. The 4th was 100% in
> waiting. (I didn't find out which process was on that cpu, unfortunately.)
>
> I powered off the machine (yesterday), booted this morning and things
> are working without errors. I stopped postgres clusters, though.
>
> I have a few questions:
>
> 1. After something like this happens and the machine is rebooted is there
> a procedure which would lower the probablity of encountering the deadlock
> again (maybe btrfs scrub or btrfs defragment or something like that)? This
> happened after a heavy write activity, so maybe fragmentation had
> something to do with it.
>
> 2. Should I run btrfsck (or something else) to verify on-disk integrity after a
> problem like this? Or it's just an in-memory problem, so I can assume that
> nothing bad happened to the data on disks.
>
> 3. I'd like to write a watchdog program to catch deadlocks and reboot
> (probably power-cycle) the VM, but I'm not sure what would the appropriate
> check be. Does it have to write something to the disk or reading would be
> sufficient? And how to bypass the OS buffer cache (fsync() or O_DIRECT
> should do it for writing, but I'm not sure about reading)?
>
> What would the appropriate timeout be? (If the operation doesn't
> complete in xx seconds a reboot should be triggered, but I don't know how
> many seconds to wait when there's a heavy load and things are just slow,
> but there's no deadlock.)
>
> Should I put watchdog process in real-time class (or however the equivalent
> is called on Linux)? Since this is a mainline kernel, I'm not sure if I could assume
> that real-time support won't have bugs of its own.
>
> And last, but not least, is there additional data that could help with debugging
> issues like this? (If possible, something that could be programmed into
> watchdog service.)
It's a regression introduced in 5.2
Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u
Thanks.
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
prev parent reply other threads:[~2019-09-11 14:59 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <DB8PR03MB56287A9E527898E5727BB41599C80@DB8PR03MB5628.eurprd03.prod.outlook.com>
2019-07-18 12:23 ` Deadlock between btrfs-transacti and userland on 5.2.1 Drazen Kacar
2019-07-19 12:12 ` Drazen Kacar
2019-09-11 14:59 ` Filipe Manana [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAL3q7H4sB=usrcGwwHa3jkVjsWBnNxNjLYaYJSo2tMsY0t6zRQ@mail.gmail.com' \
--to=fdmanana@gmail.com \
--cc=drazen.kacar@oradian.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).