Re: Possible deadlock when writing

From: Tony Lambiris <tony@libpcap.net>
To: CANznX5GqHJwx+y6b9kruzYXEP93oEhRtHyG_yuWwHV286+fQdw@mail.gmail.com
Cc: toth.f.janos@gmail.com, llowrey@nuclearwinter.com,
	linux-btrfs@vger.kernel.org
Subject: Re: Possible deadlock when writing
Date: Mon, 3 Dec 2018 04:36:14 -0500	[thread overview]
Message-ID: <CAFv=4tESCPF6vo64apw5G=5_EHFxPt1Q2F_tci7JXvD1-_8RvQ@mail.gmail.com> (raw)
In-Reply-To: <ABEF1951-66FC-48F8-A7C4-0D499F48AF84@gmail.com>

I've been running into (what I believe) is the same issue ever since
upgrading to 4.19:

[28950.083040] BTRFS error (device dm-0): bad tree block start, want
1815648960512 have 0
[28950.083047] BTRFS: error (device dm-0) in __btrfs_free_extent:6804:
errno=-5 IO failure
[28950.083048] BTRFS info (device dm-0): forced readonly
[28950.083050] BTRFS: error (device dm-0) in
btrfs_run_delayed_refs:2935: errno=-5 IO failure
[28950.083866] BTRFS error (device dm-0): pending csums is 9564160
[29040.413973] TaskSchedulerFo[17189]: segfault at 0 ip
000056121a2cb73b sp 00007f1cca425b80 error 4 in
chrome[561218101000+6513000]

This has been happening consistently to me on two laptops and a
workstation all running Arch Linux -- all different hardware the only
thing in common is they have SSDs/nvme storage and they all use btrfs.

I initially thought it had something to do with the fstrim.timer unit
kicking off an fstrim run that was somehow causing contention with
btrfs. As luck would have it my btrfs file-system on one laptop just
remounted read-only and I believe that while my physical memory was
not entirely used up (I would guess usage to be ~45% physical). While
I believe the rest of available memory was being utilized by the VFS
buffer cache, I'm not 100% on actual utilization but after reading the
email from mbakiev@ I did make a mental note before initiating a
required reboot.

I came across this comment from Ubuntu's bugtracker:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/159356/comments/62

The author of post #62 notes that this particular behavior happens
when they are running several instances of Chrome. I don't know if
this bug filed or issue is related at all, but an interesting note is
that I also almost always happen to be interacting with Google Chrome
when the read-only remount happens.

Here is the last entry from journald before I rebooted:
Dec 03 00:00:39 tenforward kernel: BTRFS error (device dm-3): bad tree
block start, want 761659392 have 15159222128734632161

Here are the only changes I made that would be relevant:
vm.swappiness = 10
vm.overcommit_memory = 1
vm.oom_kill_allocating_task = 1
vm.panic_on_oom = 1

Hope I didn't miss anything, thanks!

On Sat, Dec 1, 2018 at 6:21 PM Martin Bakiev <mbakiev@gmail.com> wrote:
>
> I was having the same issue with kernels 4.19.2 and 4.19.4. I don’t appear to have the issue with 4.20.0-0.rc1 on Fedora Server 29.
>
> The issue is very easy to reproduce on my setup, not sure how much of it is actually relevant, but here it is:
>
> - 3 drive RAID5 created
> - Some data moved to it
> - Expanded to 7 drives
> - No balancing
>
> The issue is easily reproduced (within 30 mins) by starting multiple transfers to the volume (several TB in the form of many 30GB+ files). Multiple concurrent ‘rsync’ transfers seems to take a bit longer to trigger the issue, but multiple ‘cp’ commands will do it much quicker (again not sure if relevant).
>
> I have not seen the issue occur with a single ‘rsync’ or ‘cp’ transfer, but I haven’t left one running alone for too long (copying the data from multiple drives, so there is a lot to be gained from parallelizing the transfers).
>
> I’m not sure what state the FS is left in after Magic SysRq reboot after it deadlocks, but seemingly it’s fine. No problems mounting and ‘btrfs check’ passes OK. I’m sure some of the data doesn’t get flushed, but it’s no problem for my use case.
>
> I’ve been running nonstop concurrent transfers with kernel 4.20.0-0.rc1 for 24hr nonstop and I haven’t experienced the issue.
>
> Hope this helps.