From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DFABC636C9 for ; Thu, 15 Jul 2021 22:13:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EB2EF613C9 for ; Thu, 15 Jul 2021 22:13:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231871AbhGOWQK (ORCPT ); Thu, 15 Jul 2021 18:16:10 -0400 Received: from mout.gmx.net ([212.227.15.19]:47965 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229810AbhGOWQK (ORCPT ); Thu, 15 Jul 2021 18:16:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1626387194; bh=wlbOYzRjipGpwsT+yPdyeEjSTm8SE7z97qfExJC4zDc=; h=X-UI-Sender-Class:Subject:To:References:From:Date:In-Reply-To; b=Fm/us3pdTb6PJXIiQctuLct6kBcXjI5bzW7SeN9aA2FnkYecpSqNEAk+F/djSCLzX AUCXz7v8L56p9hO8Wa7cQEr+/6vLmzZLKl7tVw24z8W3Ka6FgEPN9n4TtWcIUQhkTp uv1Wm/K9zkMxCRK4jPkm68FKMQH26i0xvQaniSEw= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [0.0.0.0] ([149.28.201.231]) by mail.gmx.net (mrgmx005 [212.227.17.184]) with ESMTPSA (Nemesis) id 1ML9yS-1lnAgV3DsI-00IGbD; Fri, 16 Jul 2021 00:13:14 +0200 Subject: Re: migrating to space_cache=2 and btrfs userspace commands To: DanglingPointer , linux-btrfs@vger.kernel.org References: <63396688-0dc7-17c5-a830-5893b030a30f@gmail.com> <86f0624a-cba4-58a3-0a80-460d3f12e8b3@gmx.com> From: Qu Wenruo Message-ID: <94f7f31a-21d6-5cc4-fd20-4641f31aa682@gmx.com> Date: Fri, 16 Jul 2021 06:13:10 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:cANOrLXaex723MrYvz34FQYUx4ub3MSZEN47tIIswo3wOXbYDgv pZ1I0H8jtxxohgOaCmAV2HbsXiBNMGjX3ZhMNxhoV0vIJ/dgCYU8Xcpqqc8LvzL6Apcfq5d pheW77AEWrRBdezvIoD2bfDV/Sa/xPiCMWpRJs10KnlNt70XrsYjKtI46f+FH5e21S+Xm0Z XNSZbVcfRRnz6oSIYqRMQ== X-UI-Out-Filterresults: notjunk:1;V03:K0:c1KpsA5GAwk=:MJfm90+WaXJU+ULim/R9Ft ZSjrjw83oaGfvjUId+gPxi7EXpZzkEALYF0ji8gYwFD9A+FVZY7rzB7/XpLp6D+RCe5B/Awuv MWkK9XbemaXDhal+QVg5dnr248sUgE6tkA78yf/V3tSVvUEfc5jtuM9HbnkgVLEHZYhR+S996 UAYq7ghfWiJ4W2mGWnfkVsm5PxMNIYBDjvXYXrmRzrbIaR0RE/4DpqRMhNddU9QdKwScUlxsC gx4qdQjNTZiIWDYmYnfv+BLKZlb0U0ClxxzCUNHELQuMHzlAghPmLenijfTYbtgKD9zJjDcsE cyEaIOw+EJcdbQSk4MhRYQD7urKHNuxy+odHHa3sfB1wcLVI906EgKAgyY/yWdQ0CGeJ7JKD/ StzYoas8vqKXzK9TyNJZTQnVrLchyBY+kiaRtW+gBJmiJCsnSIMOyxtSApabYgvuzv8ZruAtp jmpIhjngleCVkwGNazrja9h8naxsKcrf7D9gvLvaQvV1VxanntdPO0fuH5+RR/RC5qlnI7NIv 4fuo4erUuvphdI/5GLSqK8hId9PA1UGUaIQLUgEgLOjvD2gWXFYCah/pXjYG6rj/hOwdPAQC4 CKFSDUTz9pxOPO4vNCD4enEp3v3QJQinxmoYPLLC2RzmbY74p8eQC0jWMmixWxJrVX7wcMnDU APTmUT1KKMhVA1zKUCOBU6yJOJlcElw0729OB90ctUEj3fjldIiCMCLATNW0yKGtsA3yUWE8q dMFDy4TLAZfTQtDQ46UIpFvktOxLCiiBIKamYueGUduSmfcLhElwGQ7s6bL5pBxAeQzBFaCVe 6zBajVRps9LDPHMBhMuQbzAIbY0HoIgBDpFJEF13LzXesZNzvWoVAPkI9vHCB+XylcbxGxegN MNIST6OLY84ABU2cLiioDkkCAG8nwt5OnKuuimymcA+d4ccoyQuXt1iiSvwRYwMGgggkypXr3 SM/4wcxFO4Nu/T4jJwOrq1pWRqPxWDjKMemxODH2bfqRpBKoEXQJDNXJ2mntWpVqXE+sVOWt1 xia8+g8elpJbfllynt6sUKw73vxGiZvI1jOR6Y5Mmt47BDRhIl2oPfOJns5XePukV1pShQIRB Uh4EPPmCvlHSzTiUMD11iFOejw0fAGWtqQWiNrz+tyMJiCFaXHdrGIA7A== Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 2021/7/16 =E4=B8=8A=E5=8D=8812:40, DanglingPointer wrote: > Hi Qu, > > Just updating here that setting the mount option "space_cache=3Dv2" and > "noatime" completely SOLVED the performance problem! > Basically like night and day! > > > These are my full fstab mount options... > > btrfs defaults,autodefrag,space_cache=3Dv2,noatime 0 2 > > > Perhaps defaulting the space_cache=3Dv2 should be considered? We're already considering that. >=C2=A0 Why default > to v1, what's the value of v1? One of the problem in the past is the lack of write ability in btrfs-progs= . Now we're testing default it in mkfs.btrfs. Thanks, Qu > > > So for conclusion, for large multi-terrabyte arrays (in my case RAID5s), > setting space_cache=3Dv2 and noatime massively increases performance and > eliminates the large long pauses in frequent intervals by > "btrfs-transacti" blocking all IO. > > Thanks Qu for your help! > > > > On 14/7/21 5:45 pm, Qu Wenruo wrote: >> >> >> On 2021/7/14 =E4=B8=8B=E5=8D=883:18, DanglingPointer wrote: >>> a) "echo l > /proc/sysrq-trigger" >>> >>> The backup finished today already unfortunately and we are unlikely to >>> run it again until we get an outage to remount the array with the >>> space_cache=3Dv2 and noatime mount options. >>> Thanks for the command, we'll definitely use it if/when it happens aga= in >>> on the next large migration of data. >> >> Just to avoid confusion, after that command, "dmesg" output is still >> needed, as that's where sysrq put its output. >>> >>> >>> b) "sudo btrfs qgroup show -prce" ........ >>> >>> $ ERROR: can't list qgroups: quotas not enabled >>> >>> So looks like it isn't enabled. >> >> One less thing to bother. >>> >>> File sizes are between: 1,048,576 bytes and 16,777,216 bytes (Duplicac= y >>> backup defaults) >> >> Between 1~16MiB, thus tons of small files. >> >> Btrfs is not really good at handling tons of small files, as they >> generate a lot of metadata. >> >> That may contribute to the hang. >> >>> >>> What classifies as a transaction? >> >> It's a little complex. >> >> Technically it's a check point where before the checkpoint, all you see >> is old data, after the checkpoint, all you see is new data. >> >> To end users, any data and metadata write will be included into one >> transaction (with proper dependency handled). >> >> One way to finish (or commit) current transaction is to sync the fs, >> using "sync" command (sync all filesystems). >> >>> Any/All writes done in a 30sec >>> interval? >> >> This the default commit interval. Almost all fses will try to commit it= s >> data/metadata to disk after a configurable interval. >> >> The default one is 30s. That's also one way to commit current >> transaction. >> >>> =C2=A0 If 100 unique files were written in 30secs, is that 1 >>> transaction or 100 transactions? >> >> It depends. As things like syncfs() and subvolume/snapshot creation may >> try to commit transaction. >> >> But without those special operations, just writing 100 unique files >> using buffered write, it would only start one transaction, and when the >> 30s interval get hit, the transaction will be committed to disk. >> >>> =C2=A0 Millions of files of the size range >>> above were backed up. >> >> The amount of files may not force a transaction commit, if it doesn't >> trigger enough memory pressure, or free space pressure. >> >> Anyway, the "echo l" sysrq would help us to locate what's taking so lon= g >> time. >> >>> >>> >>> c) "Just mount with "space_cache=3Dv2"" >>> >>> Ok so no need to "clear_cache" the v1 cache, right? >> >> Yes, and "clear_cache" won't really remove all the v1 cache anyway. >> >> Thus it doesn't help much. >> >> The only way to fully clear v1 cache is by using "btrfs check >> --clear-space-cache v1" on a *unmounted* btrfs. >> >>> I wrote this in the fstab but hadn't remounted yet until I can get an >>> outage.... >> >> IMHO if you really want to test if v2 would help, you can just remount, >> no need to wait for a break. >> >> Thanks, >> Qu >>> >>> ..."btrfs defaults,autodefrag,clear_cache,space_cache=3Dv2,noatime=C2= =A0 0=C2=A0 2 > >>> Thanks again for your help Qu! >>> >>> On 14/7/21 2:59 pm, Qu Wenruo wrote: >>>> >>>> >>>> On 2021/7/13 =E4=B8=8B=E5=8D=8811:38, DanglingPointer wrote: >>>>> We're currently considering switching to "space_cache=3Dv2" with noa= time >>>>> mount options for my lab server-workstations running RAID5. >>>> >>>> Btrfs RAID5 is unsafe due to its write-hole problem. >>>> >>>>> >>>>> =C2=A0=C2=A0* One has 13TB of data/metadata in a bunch of 6TB and 2T= B disks >>>>> =C2=A0=C2=A0=C2=A0 totalling 26TB. >>>>> =C2=A0=C2=A0* Another has about 12TB data/metadata in uniformly size= d 6TB disks >>>>> =C2=A0=C2=A0=C2=A0 totalling 24TB. >>>>> =C2=A0=C2=A0* Both of the arrays are on individually luks encrypted = disks with >>>>> =C2=A0=C2=A0=C2=A0 btrfs on top of the luks. >>>>> =C2=A0=C2=A0* Both have "defaults,autodefrag" turned on in fstab. >>>>> >>>>> We're starting to see large pauses during constant backups of millio= ns >>>>> of chunk files (using duplicacy backup) in the 24TB array. >>>>> >>>>> Pauses sometimes take up to 20+ seconds in frequencies after every >>>>> ~30secs of the end of the last pause.=C2=A0 "btrfs-transacti" proces= s >>>>> consistently shows up as the blocking process/thread locking up >>>>> filesystem IO.=C2=A0 IO gets into the RAID5 array via nfsd. There ar= e no >>>>> disk >>>>> or btrfs errors recorded.=C2=A0 scrub last finished yesterday succes= sfully. >>>> >>>> Please provide the "echo l > /proc/sysrq-trigger" output when such >>>> pause >>>> happens. >>>> >>>> If you're using qgroup (may be enabled by things like snapper), it ma= y >>>> be the cause, as qgroup does its accounting when committing >>>> transaction. >>>> >>>> If one transaction is super large, it can cause such problem. >>>> >>>> You can test if qgroup is enabled by: >>>> >>>> # btrfs qgroup show -prce >>>> >>>>> >>>>> After doing some research around the internet, we've come to the >>>>> consideration above as described.=C2=A0 Unfortunately the official >>>>> documentation isn't clear on the following. >>>>> >>>>> Official documentation URL - >>>>> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5) >>>>> >>>>> 1. How to migrate from default space_cache=3Dv1 to space_cache=3Dv2?= It >>>>> =C2=A0=C2=A0=C2=A0 talks about the reverse, from v2 to v1! >>>> >>>> Just mount with "space_cache=3Dv2". >>>> >>>>> 2. If we use space_cache=3Dv2, is it indeed still the case that the >>>>> =C2=A0=C2=A0=C2=A0 "btrfs" command will NOT work with the filesystem= ? >>>> >>>> Why would you think "btrfs" won't work on a btrfs? >>>> >>>> Thanks, >>>> Qu >>>> >>>>> =C2=A0 So will our >>>>> =C2=A0=C2=A0=C2=A0 "btrfs scrub start /mount/point/..." cron jobs FA= IL? I'm guessing >>>>> =C2=A0=C2=A0=C2=A0 the btrfs command comes from btrfs-progs which is= currently >>>>> v5.4.1-2 >>>>> =C2=A0=C2=A0=C2=A0 amd64, is that correct? >>>>> 3. Any other ideas on how we can get rid of those annoying pauses wi= th >>>>> =C2=A0=C2=A0=C2=A0 large backups into the array? >>>>> >>>>> Thanks in advance! >>>>> >>>>> DP >>>>>