linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@gmail.com>
To: Drazen Kacar <drazen.kacar@oradian.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Deadlock between btrfs-transacti and userland on 5.2.1
Date: Wed, 11 Sep 2019 15:59:05 +0100	[thread overview]
Message-ID: <CAL3q7H4sB=usrcGwwHa3jkVjsWBnNxNjLYaYJSo2tMsY0t6zRQ@mail.gmail.com> (raw)
In-Reply-To: <DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com>

On Thu, Jul 18, 2019 at 1:25 PM Drazen Kacar <drazen.kacar@oradian.com> wrote:
>
> Hello,
>
> I think I've encountered a deadlock between btrfs-transacti and postgres
> process(es). This is system information (btrfs fi usage obtained after
> poweroff and boot):
>
> # cat /etc/redhat-release
> CentOS Linux release 7.6.1810 (Core)
>
> # uname -a
> Linux prod-dbsnap-01 5.2.1-1.el7.elrepo.x86_64 #1 SMP Sun Jul 14 08:15:04 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux
>
> # btrfs --version
> btrfs-progs v5.2
>
> # btrfs filesystem usage /data/pg_data
>  Overall:
>      Device size:                   2.00TiB
>      Device allocated:            345.03GiB
>      Device unallocated:            1.66TiB
>      Device missing:                  0.00B
>      Used:                        338.07GiB
>      Free (estimated):              1.67TiB      (min: 854.27GiB)
>      Data ratio:                       1.00
>      Metadata ratio:                   2.00
>      Global reserve:              512.00MiB      (used: 0.00B)
>
>  Data,RAID0: Size:332.00GiB, Used:329.22GiB
>     /dev/sdb       83.00GiB
>     /dev/sdc       83.00GiB
>     /dev/sdd       83.00GiB
>     /dev/sde       83.00GiB
>
>  Metadata,RAID10: Size:6.50GiB, Used:4.42GiB
>     /dev/sdb        3.25GiB
>     /dev/sdc        3.25GiB
>     /dev/sdd        3.25GiB
>     /dev/sde        3.25GiB
>
>  System,RAID10: Size:16.00MiB, Used:48.00KiB
>     /dev/sdb        8.00MiB
>     /dev/sdc        8.00MiB
>     /dev/sdd        8.00MiB
>     /dev/sde        8.00MiB
>
>  Unallocated:
>     /dev/sdb      425.74GiB
>     /dev/sdc      425.74GiB
>     /dev/sdd      425.74GiB
>     /dev/sde      425.74GiB
>
> There were three btrfs subvolumes and on each one there was a Postgres
> database slave doing recovery (single threaded writing). But there was a
> lot of writing. And prior to starting Postgres slaves I was restoring base
> backup from the backup server, which was being done by a number of
> parallel rsync processes (12 at most, I think).
>
> The file system is mounted with:
>
> # grep btrfs /proc/mounts
> /dev/sdd /data/pg_data btrfs rw,noatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/ 0 0
>
> After several hours I got this in /var/log/messages:
>
> # grep 'INFO: task .*blocked for more than' messages
>  Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 122 seconds.
>  Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 122 seconds.
>  Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 122 seconds.
>  Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 122 seconds.
>  Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 122 seconds.
>  Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 245 seconds.
>  Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 245 seconds.
>  Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 245 seconds.
>  Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 245 seconds.
>  Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 245 seconds.
>
> Full log is in the attachment. It seems to me that there is a deadlock
> between btrfs-transacti and any process which is trying to write
> something (not sure about reading). While this was going on the cpu
> usage (according to top) was non-existant. There are 4 cpus (it's a virtual
> machine in VmWare) and 3 were 100% idle. The 4th was 100% in
> waiting. (I didn't find out which process was on that cpu, unfortunately.)
>
> I powered off the machine (yesterday), booted this morning and things
> are working without errors. I stopped postgres clusters, though.
>
> I have a few questions:
>
> 1.  After something like this happens and the machine is rebooted is there
> a procedure which would lower the probablity of encountering the deadlock
> again (maybe btrfs scrub or btrfs defragment or something like that)? This
> happened after a heavy write activity,  so maybe fragmentation had
> something to do with it.
>
> 2. Should I run btrfsck (or something else) to verify on-disk integrity after a
> problem like this? Or it's just an in-memory problem, so I can assume that
> nothing bad happened to the data on disks.
>
> 3. I'd like to write a watchdog program to catch deadlocks and reboot
> (probably power-cycle) the VM, but I'm not sure what would the appropriate
> check be. Does it have to write something to the disk or reading would be
> sufficient? And how to bypass the OS  buffer cache (fsync() or O_DIRECT
> should do it for writing, but I'm not sure about reading)?
>
> What would the appropriate timeout be? (If the operation doesn't
> complete in xx seconds a reboot should be triggered, but I don't know how
> many seconds to wait when there's  a heavy load and things are just slow,
> but there's no deadlock.)
>
> Should I put watchdog process in real-time class (or however the equivalent
> is called on Linux)? Since this is a mainline kernel, I'm not sure if I could assume
> that real-time support won't have  bugs of its own.
>
> And last, but not least, is there additional data that could help with debugging
> issues like this? (If possible, something that could be programmed into
> watchdog service.)

It's a regression introduced in 5.2
Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u

Thanks.



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

      parent reply	other threads:[~2019-09-11 14:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <DB8PR03MB56287A9E527898E5727BB41599C80@DB8PR03MB5628.eurprd03.prod.outlook.com>
2019-07-18 12:23 ` Deadlock between btrfs-transacti and userland on 5.2.1 Drazen Kacar
2019-07-19 12:12   ` Drazen Kacar
2019-09-11 14:59   ` Filipe Manana [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H4sB=usrcGwwHa3jkVjsWBnNxNjLYaYJSo2tMsY0t6zRQ@mail.gmail.com' \
    --to=fdmanana@gmail.com \
    --cc=drazen.kacar@oradian.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).