From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E0B4C07E96 for ; Thu, 15 Jul 2021 16:40:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3416A613F8 for ; Thu, 15 Jul 2021 16:40:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232800AbhGOQnW (ORCPT ); Thu, 15 Jul 2021 12:43:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232585AbhGOQnW (ORCPT ); Thu, 15 Jul 2021 12:43:22 -0400 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CD64C06175F for ; Thu, 15 Jul 2021 09:40:28 -0700 (PDT) Received: by mail-pj1-x102f.google.com with SMTP id b5-20020a17090a9905b029016fc06f6c5bso4836111pjp.5 for ; Thu, 15 Jul 2021 09:40:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:cc:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=iI5jTUuimD4hXvCpwOnWjIBZjA5qquqJ3w+NlrTPdSQ=; b=JVKcPFb6lPGMxIQeF1P1kz+qTr2ZFGcBcrZZw7iAcIBPdljWYN37CX0Yk3OO+7kkQ9 2pEqdvNtrWQhZS9hBvwdF3nrJ+Xaj2K6KKNR20IQjIypNd10Lb+Nz6oChEmCO35xMD0g wlJfoZCm+IMoIR5BskxlMpR1lhgZU4P575+x8bihmustOLxOB67cZJUWQ1pw/2l8UFQs 4MBGU0amw2pz6l3spqyqWJpTsKciYthbzRQcTQb+hK8k4lPGDlJ4w59qZDzKarT0lftU G6VjIL8KeiFiZiKG+U9niVrXtjxgipfA93XYuQ1rGpRKFv05JKtkaPvAe5/pI8m+YAuL Qm6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:cc:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=iI5jTUuimD4hXvCpwOnWjIBZjA5qquqJ3w+NlrTPdSQ=; b=TGwa7nL63O5TSiXK3r2lo3VfbTvH6CCX6MhKlWg2NeQr0CqX48POYQHVB7IgQl/ZoQ Ui0+gytjvZFE/c0LjYZWnSe6RZHjVcCH0Mz3rxWxkmiYLxIjsUSvOSmf3nLdIDeanfZG dN4x2lNTlnwyCp6kV7ud4wK15GvpWyMlucJU2UOmPEVMHkB99DyUh0Xpz4EMI0wpAm9D 2R6t/IRoz9YeOPLb9VVwmSr6avqooWiDQTBA9Y/nlvS4t8z772lkCWoD4mI7Q6M0i++J yoKGL8AHBZC7Gp0MCnctSvQhYu7GQi51oUEaMhEMdKpdxeoQxhEGAuPDEi3PnoltHLxl En5w== X-Gm-Message-State: AOAM5330onaqwigs4Ha+u+8hkMtkELOFZvPSyBlsGmdnF0kBmsUZaFsD qbQH2SsvzUAsofRTfqoFT68= X-Google-Smtp-Source: ABdhPJwmBwy1/W6vw7uChQqMbkDCGjVyulkB6Q2iZOmvGPjrf6LLJ2dfRezdqPpT9FjRCmc2KsQLeA== X-Received: by 2002:a17:902:bf45:b029:129:8147:3a93 with SMTP id u5-20020a170902bf45b029012981473a93mr4187665pls.84.1626367227731; Thu, 15 Jul 2021 09:40:27 -0700 (PDT) Received: from [192.168.178.53] ([61.68.235.24]) by smtp.gmail.com with ESMTPSA id bj15sm5929171pjb.6.2021.07.15.09.40.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 15 Jul 2021 09:40:27 -0700 (PDT) Subject: Re: migrating to space_cache=2 and btrfs userspace commands To: Qu Wenruo , linux-btrfs@vger.kernel.org References: <63396688-0dc7-17c5-a830-5893b030a30f@gmail.com> <86f0624a-cba4-58a3-0a80-460d3f12e8b3@gmx.com> From: DanglingPointer Cc: danglingpointerexception@gmail.com Message-ID: Date: Fri, 16 Jul 2021 02:40:23 +1000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-AU Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Hi Qu, Just updating here that setting the mount option "space_cache=v2" and "noatime" completely SOLVED the performance problem! Basically like night and day! These are my full fstab mount options... btrfs defaults,autodefrag,space_cache=v2,noatime 0 2 Perhaps defaulting the space_cache=v2 should be considered?  Why default to v1, what's the value of v1? So for conclusion, for large multi-terrabyte arrays (in my case RAID5s), setting space_cache=v2 and noatime massively increases performance and eliminates the large long pauses in frequent intervals by "btrfs-transacti" blocking all IO. Thanks Qu for your help! On 14/7/21 5:45 pm, Qu Wenruo wrote: > > > On 2021/7/14 下午3:18, DanglingPointer wrote: >> a) "echo l > /proc/sysrq-trigger" >> >> The backup finished today already unfortunately and we are unlikely to >> run it again until we get an outage to remount the array with the >> space_cache=v2 and noatime mount options. >> Thanks for the command, we'll definitely use it if/when it happens again >> on the next large migration of data. > > Just to avoid confusion, after that command, "dmesg" output is still > needed, as that's where sysrq put its output. >> >> >> b) "sudo btrfs qgroup show -prce" ........ >> >> $ ERROR: can't list qgroups: quotas not enabled >> >> So looks like it isn't enabled. > > One less thing to bother. >> >> File sizes are between: 1,048,576 bytes and 16,777,216 bytes (Duplicacy >> backup defaults) > > Between 1~16MiB, thus tons of small files. > > Btrfs is not really good at handling tons of small files, as they > generate a lot of metadata. > > That may contribute to the hang. > >> >> What classifies as a transaction? > > It's a little complex. > > Technically it's a check point where before the checkpoint, all you see > is old data, after the checkpoint, all you see is new data. > > To end users, any data and metadata write will be included into one > transaction (with proper dependency handled). > > One way to finish (or commit) current transaction is to sync the fs, > using "sync" command (sync all filesystems). > >> Any/All writes done in a 30sec >> interval? > > This the default commit interval. Almost all fses will try to commit its > data/metadata to disk after a configurable interval. > > The default one is 30s. That's also one way to commit current > transaction. > >>   If 100 unique files were written in 30secs, is that 1 >> transaction or 100 transactions? > > It depends. As things like syncfs() and subvolume/snapshot creation may > try to commit transaction. > > But without those special operations, just writing 100 unique files > using buffered write, it would only start one transaction, and when the > 30s interval get hit, the transaction will be committed to disk. > >>   Millions of files of the size range >> above were backed up. > > The amount of files may not force a transaction commit, if it doesn't > trigger enough memory pressure, or free space pressure. > > Anyway, the "echo l" sysrq would help us to locate what's taking so long > time. > >> >> >> c) "Just mount with "space_cache=v2"" >> >> Ok so no need to "clear_cache" the v1 cache, right? > > Yes, and "clear_cache" won't really remove all the v1 cache anyway. > > Thus it doesn't help much. > > The only way to fully clear v1 cache is by using "btrfs check > --clear-space-cache v1" on a *unmounted* btrfs. > >> I wrote this in the fstab but hadn't remounted yet until I can get an >> outage.... > > IMHO if you really want to test if v2 would help, you can just remount, > no need to wait for a break. > > Thanks, > Qu >> >> ..."btrfs defaults,autodefrag,clear_cache,space_cache=v2,noatime  0  2 > >> Thanks again for your help Qu! >> >> On 14/7/21 2:59 pm, Qu Wenruo wrote: >>> >>> >>> On 2021/7/13 下午11:38, DanglingPointer wrote: >>>> We're currently considering switching to "space_cache=v2" with noatime >>>> mount options for my lab server-workstations running RAID5. >>> >>> Btrfs RAID5 is unsafe due to its write-hole problem. >>> >>>> >>>>   * One has 13TB of data/metadata in a bunch of 6TB and 2TB disks >>>>     totalling 26TB. >>>>   * Another has about 12TB data/metadata in uniformly sized 6TB disks >>>>     totalling 24TB. >>>>   * Both of the arrays are on individually luks encrypted disks with >>>>     btrfs on top of the luks. >>>>   * Both have "defaults,autodefrag" turned on in fstab. >>>> >>>> We're starting to see large pauses during constant backups of millions >>>> of chunk files (using duplicacy backup) in the 24TB array. >>>> >>>> Pauses sometimes take up to 20+ seconds in frequencies after every >>>> ~30secs of the end of the last pause.  "btrfs-transacti" process >>>> consistently shows up as the blocking process/thread locking up >>>> filesystem IO.  IO gets into the RAID5 array via nfsd. There are no >>>> disk >>>> or btrfs errors recorded.  scrub last finished yesterday successfully. >>> >>> Please provide the "echo l > /proc/sysrq-trigger" output when such >>> pause >>> happens. >>> >>> If you're using qgroup (may be enabled by things like snapper), it may >>> be the cause, as qgroup does its accounting when committing >>> transaction. >>> >>> If one transaction is super large, it can cause such problem. >>> >>> You can test if qgroup is enabled by: >>> >>> # btrfs qgroup show -prce >>> >>>> >>>> After doing some research around the internet, we've come to the >>>> consideration above as described.  Unfortunately the official >>>> documentation isn't clear on the following. >>>> >>>> Official documentation URL - >>>> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5) >>>> >>>> 1. How to migrate from default space_cache=v1 to space_cache=v2? It >>>>     talks about the reverse, from v2 to v1! >>> >>> Just mount with "space_cache=v2". >>> >>>> 2. If we use space_cache=v2, is it indeed still the case that the >>>>     "btrfs" command will NOT work with the filesystem? >>> >>> Why would you think "btrfs" won't work on a btrfs? >>> >>> Thanks, >>> Qu >>> >>>>   So will our >>>>     "btrfs scrub start /mount/point/..." cron jobs FAIL? I'm guessing >>>>     the btrfs command comes from btrfs-progs which is currently >>>> v5.4.1-2 >>>>     amd64, is that correct? >>>> 3. Any other ideas on how we can get rid of those annoying pauses with >>>>     large backups into the array? >>>> >>>> Thanks in advance! >>>> >>>> DP >>>>