From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12529C4338F for ; Mon, 9 Aug 2021 12:36:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D2F9060F38 for ; Mon, 9 Aug 2021 12:36:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233614AbhHIMhB (ORCPT ); Mon, 9 Aug 2021 08:37:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232991AbhHIMhB (ORCPT ); Mon, 9 Aug 2021 08:37:01 -0400 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0ECAFC0613D3 for ; Mon, 9 Aug 2021 05:36:41 -0700 (PDT) Received: by mail-oi1-x229.google.com with SMTP id o20so23224497oiw.12 for ; Mon, 09 Aug 2021 05:36:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=R5nLoqg/DgxvLxj8COiouJhS0dH9c0ZAj+4FPxEuc0w=; b=TGGy38d1IGkqCSyyznpBKjc2rir7iiFE/fCCcHL01ih5xiQEtaBUFti6SFUKZCQiXo 5m/QbUrKdb1lEBek9PFn7+k4TjkKQpvpKrEl7P6ZR5MvGBnTjHtNgAJYY86IJK7Zzram SbYdm6H094WqQb5+/xKyG8/w86+8S14r/oKuEULkOvL9U3np+au1Ky4KzG5a5zbyWuqW IbrZviBy9PjN3hZRVrjHjB41N5ggcc1Y7HPkMW5HhKj6KAqMHRj3vzO8CS4L3fkYjvXC 6vlyMcIT+un2KCDPPFnf8h+PmZIr2WodhMcBjTBMv08zVJAo4ZQuwQNswoplS351uMrP ZVIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=R5nLoqg/DgxvLxj8COiouJhS0dH9c0ZAj+4FPxEuc0w=; b=HSIMpOzfF1iw1Bj8OPkCna8pKwG11PpUG6TvHO/9cdabyf6i84BVS2maVUyIFkpbdq 2T9cF/tUcP/wKmUeorl+yM6rM/SgAK1jIiKCE/yFPrUoo4mqAUOqIcDTh6ksOkciX0js QkMQl0hEJJj/J0lO5GgYqSW22mCEKs0WPkjm5tzqtydJstHXO8/5NOZs/wm/9U4hWfux GT/1MSrTMpPTp2BHyfCShuH/ugVJqu3KWQWM6f4ajRvOMd7/c91ZSLxPqhRkPW6k/7dQ LMDYxksusAyvPlkDNMj6mpIs3P3lGYGqhORxLsHUUb+rFeiw8MZRM9QDfDjaTsyrgoGM bQ7g== X-Gm-Message-State: AOAM531/Yn/ILZCAq1Yjk3l/4k1ttkk2UgVzr0JCWzGD32SlrM8Ws9Tt 1Ben2iTeqDm8+m5zF4lglVemWYEB0xAGQHk6aWknsp+ZNew= X-Google-Smtp-Source: ABdhPJxgBfGhyIeK3nNeKXq3qUQJ1FcTfwjNphykEQ5IBW1oYmMDTWnnmp4eNBCz+b1Aftlfmt7mxxe/iMqvL6EEIeo= X-Received: by 2002:aca:5d83:: with SMTP id r125mr15727074oib.113.1628512600263; Mon, 09 Aug 2021 05:36:40 -0700 (PDT) MIME-Version: 1.0 References: <9073e835-41c2-bdab-8e05-dfc759c0e22f@gmx.com> In-Reply-To: <9073e835-41c2-bdab-8e05-dfc759c0e22f@gmx.com> From: Serhat Sevki Dincer Date: Mon, 9 Aug 2021 15:36:28 +0300 Message-ID: Subject: Re: max_inline: alternative values? To: linux-btrfs@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Also, in the case of DUP metadata profile, how about duplicating "only" the metadata in the two blocks? The total inline data space of 2 * 2048/3072 =3D 4096/6144 bytes can carry unduplicated data. That requires I think metadata and inline data have separate crc32c sums. Is that feasible? On Mon, Aug 9, 2021 at 3:00 PM Qu Wenruo wrote: > > > > On 2021/8/9 =E4=B8=8B=E5=8D=887:15, Serhat Sevki Dincer wrote: > > Hi, > > > > I was reading btrfs mount options and max_inline=3D2048 (by default) > > caught my attention. > > I could not find any benchmarks on the internet comparing different > > values for this parameter. > > The most detailed info I could find is below from May 2016, when 2048 > > was set as default. > > > > So on a new-ish 64-bit system (amd64 or arm64) with "SSD" (memory/file > > blocks are 4K, > > For 64-bit arm64, there are 3 different default page size (4K, 16K and 64= K). > Thus it's a completely different beast, as currently btrfs don't support > sectorsize other than page size. > > But we're already working on support 4K sectorsize for 64K page size, > the initial support will arrive at v5.15 upstream. > > Anyway, for now we will only discuss 4K sectorsize for supported systems > (amd64 or 4K page sized aarch64), with default 16K nodesize. > > > > metadata profile "single" by default), how would max_inline=3D2048 > > compare to say 3072 ? > > Do you know/have any benchmarks comparing different values on a > > typical linux installation in terms of: > > - performance > > - total disk usage > > Personally speaking, I'm also very interested in such benchmark, as the > subpage support is coming soon, except RAID56, only inline extent > creation is disabled for subpage. > > Thus knowing the performance impact is really important. > > But there are more variables involved in such "benchmark". > Not only the inline file limit, but also things like the average file > size involved in the "typical" setup. > > If we can define the "typical" setup, I guess it would much easier to do > benchmark. > Depends on the "typical" average file size and how concurrency the > operations are, the result can change. > > > From what I know, inline extent size affects the following things: > > - Metadata size > Obviously, but since you're mentioning SSD default, it's less a > concern, as metadata is also SINGLE in that case. > > Much larger metadata will make the already slow btrfs metadata > operations even slower. > > On the other hand it will make such inlined data more compact, > as we no longer needs to pad the data to sectorsize. > > So I'm not sure about the final result. > > - Data writeback > With inline extent, we don't need to submit data writes, but inline > them directly into metadata. > > This means we don't need to things like data csum calculation, but > we also need to do more metadata csum calculation. > > Again, no obvious result again. > > > > What would be the "optimal" value for SSD on a typical desktop? server? > > I bet it's not a big deal, but would be very happy to be proven run. > > BTW, I just did a super stupid test: > ------ > fill_dir() > { > local dir=3D$1 > for (( i =3D 0; i < 5120 ; i++)); do > xfs_io -f -c "pwrite 0 3K" $dir/file_$i > /dev/null > done > sync > } > > dev=3D"/dev/test/test" > mnt=3D"/mnt/btrfs" > > umount $dev &> /dev/null > umount $mnt &> /dev/null > > mkfs.btrfs -f -s 4k -m single $dev > mount $dev $mnt -o ssd,max_inline=3D2048 > echo "ssd,max_inline=3D2048" > time fill_dir $mnt > umount $mnt > > mkfs.btrfs -f -s 4k -m single $dev > mount $dev $mnt -o ssd,max_inline=3D3072 > echo "ssd,max_inline=3D3072" > time fill_dir $mnt > umount $mnt > ------ > > The results are: > > ssd,max_inline=3D2048 > real 0m20.403s > user 0m4.076s > sys 0m16.607s > > ssd,max_inline=3D3072 > real 0m20.096s > user 0m4.195s > sys 0m16.213s > > > Except the slow nature of btrfs metadata operations, it doesn't show > much difference at least for their writeback performance. > > Thanks, > Qu > > > > > Thanks a lot.. > > > > Note: > > From: David Sterba > > > > commit f7e98a7fff8634ae655c666dc2c9fc55a48d0a73 upstream. > > > > The current practical default is ~4k on x86_64 (the logic is more compl= ex, > > simplified for brevity), the inlined files land in the metadata group a= nd > > thus consume space that could be needed for the real metadata. > > > > The inlining brings some usability surprises: > > > > 1) total space consumption measured on various filesystems and btrfs > > with DUP metadata was quite visible because of the duplicated data > > within metadata > > > > 2) inlined data may exhaust the metadata, which are more precious in ca= se > > the entire device space is allocated to chunks (ie. balance cannot > > make the space more compact) > > > > 3) performance suffers a bit as the inlined blocks are duplicate and > > stored far away on the device. > > > > Proposed fix: set the default to 2048 > > > > This fixes namely 1), the total filesysystem space consumption will be = on > > par with other filesystems. > > > > Partially fixes 2), more data are pushed to the data block groups. > > > > The characteristics of 3) are based on actual small file size > > distribution. > > > > The change is independent of the metadata blockgroup type (though it's > > most visible with DUP) or system page size as these parameters are not > > trival to find out, compared to file size. > >