linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: DanglingPointer <danglingpointerexception@gmail.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Cc: danglingpointerexception@gmail.com
Subject: Re: migrating to space_cache=2 and btrfs userspace commands
Date: Fri, 16 Jul 2021 02:40:23 +1000	[thread overview]
Message-ID: <a4ef513e-c7a4-99e0-c957-206a3763d9d1@gmail.com> (raw)
In-Reply-To: <ec9e92d8-ddfd-a103-6175-5176827ce9aa@gmx.com>

Hi Qu,

Just updating here that setting the mount option "space_cache=v2" and 
"noatime" completely SOLVED the performance problem!
Basically like night and day!


These are my full fstab mount options...

btrfs defaults,autodefrag,space_cache=v2,noatime 0 2


Perhaps defaulting the space_cache=v2 should be considered?  Why default 
to v1, what's the value of v1?


So for conclusion, for large multi-terrabyte arrays (in my case RAID5s), 
setting space_cache=v2 and noatime massively increases performance and 
eliminates the large long pauses in frequent intervals by 
"btrfs-transacti" blocking all IO.

Thanks Qu for your help!



On 14/7/21 5:45 pm, Qu Wenruo wrote:
>
>
> On 2021/7/14 下午3:18, DanglingPointer wrote:
>> a) "echo l > /proc/sysrq-trigger"
>>
>> The backup finished today already unfortunately and we are unlikely to
>> run it again until we get an outage to remount the array with the
>> space_cache=v2 and noatime mount options.
>> Thanks for the command, we'll definitely use it if/when it happens again
>> on the next large migration of data.
>
> Just to avoid confusion, after that command, "dmesg" output is still
> needed, as that's where sysrq put its output.
>>
>>
>> b) "sudo btrfs qgroup show -prce" ........
>>
>> $ ERROR: can't list qgroups: quotas not enabled
>>
>> So looks like it isn't enabled.
>
> One less thing to bother.
>>
>> File sizes are between: 1,048,576 bytes and 16,777,216 bytes (Duplicacy
>> backup defaults)
>
> Between 1~16MiB, thus tons of small files.
>
> Btrfs is not really good at handling tons of small files, as they
> generate a lot of metadata.
>
> That may contribute to the hang.
>
>>
>> What classifies as a transaction?
>
> It's a little complex.
>
> Technically it's a check point where before the checkpoint, all you see
> is old data, after the checkpoint, all you see is new data.
>
> To end users, any data and metadata write will be included into one
> transaction (with proper dependency handled).
>
> One way to finish (or commit) current transaction is to sync the fs,
> using "sync" command (sync all filesystems).
>
>> Any/All writes done in a 30sec
>> interval?
>
> This the default commit interval. Almost all fses will try to commit its
> data/metadata to disk after a configurable interval.
>
> The default one is 30s. That's also one way to commit current 
> transaction.
>
>>   If 100 unique files were written in 30secs, is that 1
>> transaction or 100 transactions?
>
> It depends. As things like syncfs() and subvolume/snapshot creation may
> try to commit transaction.
>
> But without those special operations, just writing 100 unique files
> using buffered write, it would only start one transaction, and when the
> 30s interval get hit, the transaction will be committed to disk.
>
>>   Millions of files of the size range
>> above were backed up.
>
> The amount of files may not force a transaction commit, if it doesn't
> trigger enough memory pressure, or free space pressure.
>
> Anyway, the "echo l" sysrq would help us to locate what's taking so long
> time.
>
>>
>>
>> c) "Just mount with "space_cache=v2""
>>
>> Ok so no need to "clear_cache" the v1 cache, right?
>
> Yes, and "clear_cache" won't really remove all the v1 cache anyway.
>
> Thus it doesn't help much.
>
> The only way to fully clear v1 cache is by using "btrfs check
> --clear-space-cache v1" on a *unmounted* btrfs.
>
>> I wrote this in the fstab but hadn't remounted yet until I can get an
>> outage....
>
> IMHO if you really want to test if v2 would help, you can just remount,
> no need to wait for a break.
>
> Thanks,
> Qu
>>
>> ..."btrfs defaults,autodefrag,clear_cache,space_cache=v2,noatime  0  2 >
>> Thanks again for your help Qu!
>>
>> On 14/7/21 2:59 pm, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/7/13 下午11:38, DanglingPointer wrote:
>>>> We're currently considering switching to "space_cache=v2" with noatime
>>>> mount options for my lab server-workstations running RAID5.
>>>
>>> Btrfs RAID5 is unsafe due to its write-hole problem.
>>>
>>>>
>>>>   * One has 13TB of data/metadata in a bunch of 6TB and 2TB disks
>>>>     totalling 26TB.
>>>>   * Another has about 12TB data/metadata in uniformly sized 6TB disks
>>>>     totalling 24TB.
>>>>   * Both of the arrays are on individually luks encrypted disks with
>>>>     btrfs on top of the luks.
>>>>   * Both have "defaults,autodefrag" turned on in fstab.
>>>>
>>>> We're starting to see large pauses during constant backups of millions
>>>> of chunk files (using duplicacy backup) in the 24TB array.
>>>>
>>>> Pauses sometimes take up to 20+ seconds in frequencies after every
>>>> ~30secs of the end of the last pause.  "btrfs-transacti" process
>>>> consistently shows up as the blocking process/thread locking up
>>>> filesystem IO.  IO gets into the RAID5 array via nfsd. There are no 
>>>> disk
>>>> or btrfs errors recorded.  scrub last finished yesterday successfully.
>>>
>>> Please provide the "echo l > /proc/sysrq-trigger" output when such 
>>> pause
>>> happens.
>>>
>>> If you're using qgroup (may be enabled by things like snapper), it may
>>> be the cause, as qgroup does its accounting when committing 
>>> transaction.
>>>
>>> If one transaction is super large, it can cause such problem.
>>>
>>> You can test if qgroup is enabled by:
>>>
>>> # btrfs qgroup show -prce <mnt>
>>>
>>>>
>>>> After doing some research around the internet, we've come to the
>>>> consideration above as described.  Unfortunately the official
>>>> documentation isn't clear on the following.
>>>>
>>>> Official documentation URL -
>>>> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)
>>>>
>>>> 1. How to migrate from default space_cache=v1 to space_cache=v2? It
>>>>     talks about the reverse, from v2 to v1!
>>>
>>> Just mount with "space_cache=v2".
>>>
>>>> 2. If we use space_cache=v2, is it indeed still the case that the
>>>>     "btrfs" command will NOT work with the filesystem?
>>>
>>> Why would you think "btrfs" won't work on a btrfs?
>>>
>>> Thanks,
>>> Qu
>>>
>>>>   So will our
>>>>     "btrfs scrub start /mount/point/..." cron jobs FAIL? I'm guessing
>>>>     the btrfs command comes from btrfs-progs which is currently 
>>>> v5.4.1-2
>>>>     amd64, is that correct?
>>>> 3. Any other ideas on how we can get rid of those annoying pauses with
>>>>     large backups into the array?
>>>>
>>>> Thanks in advance!
>>>>
>>>> DP
>>>>

  reply	other threads:[~2021-07-15 16:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-13 15:38 migrating to space_cache=2 and btrfs userspace commands DanglingPointer
2021-07-14  4:59 ` Qu Wenruo
2021-07-14  5:44   ` Chris Murphy
2021-07-14  6:05     ` Qu Wenruo
2021-07-14  6:54       ` DanglingPointer
2021-07-14  7:07         ` Qu Wenruo
2021-07-14  7:18   ` DanglingPointer
2021-07-14  7:45     ` Qu Wenruo
2021-07-15 16:40       ` DanglingPointer [this message]
2021-07-15 22:13         ` Qu Wenruo
2021-07-15 17:51       ` Joshua
2021-07-16 12:42         ` DanglingPointer
2021-07-16 12:59           ` Qu Wenruo
2021-07-16 13:23             ` DanglingPointer
2021-07-16 20:33             ` Joshua Villwock
2021-07-16 23:00               ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a4ef513e-c7a4-99e0-c957-206a3763d9d1@gmail.com \
    --to=danglingpointerexception@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).