From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F857C433B4 for ; Tue, 20 Apr 2021 18:19:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C7D80613CE for ; Tue, 20 Apr 2021 18:19:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233092AbhDTSU0 (ORCPT ); Tue, 20 Apr 2021 14:20:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232879AbhDTSUZ (ORCPT ); Tue, 20 Apr 2021 14:20:25 -0400 Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEC80C06174A for ; Tue, 20 Apr 2021 11:19:52 -0700 (PDT) Received: by mail-lj1-x233.google.com with SMTP id l22so37308033ljc.9 for ; Tue, 20 Apr 2021 11:19:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=ZfYwKyceb1ZRYGZ+Qb1gcmZyjQyqMAPuxxN5kv/F21E=; b=rFDCtmPxzf/icM0xfVJo02sTAoIvnoTH/XnfrWCiSHO0x/CLa0Q0t6kanl4MYFUJO1 5VlUJky0V7obrj0KcDnMM2/pxolalssv7NgYQjV3iq1Bh5xMjdSIOAK+S6V7/b02Quy3 SBuQKnJz8KIyR4Uhp6xNclxlfPmjPwA3IVSQT4d9MY7VrRmfo64RD8DP+kbm/0PE2DK6 pTwIkR4bqFBVbW8+PBhLf3VqU56TqFcB0T7x1GlG8369B1YpO+laMawYII46TovJEBrb 8gUbPgYzdjcuh3ArgRauUVTIOeqf/Y9+LKvyodviSwFRxKefhLAZCMdAJOzdMiw6TU31 Yndw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ZfYwKyceb1ZRYGZ+Qb1gcmZyjQyqMAPuxxN5kv/F21E=; b=E7mcfL8bwx6a4ktoZxwSe9WmGv6qlygDGkVEYMZe76Mkb+xAjgzFLM4lHZdgAVv6WL 6ujZlwGHf+F5qcmdHrjE+ZelMTBJgfki+JmpV4/LuahwcuOGto5WNAJTbqZNrE6ClUqe FiyUGqut/bwFfoLZ03WExPFPr6TqTUQ+FJsZKwoqAKk+zEu0a6Z6BZVX6Opj3DacXyNg kmbNot7PzsXyHIW8LdD7Cmzup3z0yF90xLMT6FjYNdIqkq3x0HRgTvdoGnmXDipouTpz 9lf4SW0z+NOqG2gYbUaiUK2P1nkCQcdeBZTRsGVfUVXG1CWRtjbDPTu8ok7g8Chym21z gnPw== X-Gm-Message-State: AOAM532+XwMQ2PNfBidMs5Rlvqmf9ur7jeoT2XBqW8iPgxxwoj6N8AM2 9kSKNlZZKkbiGAjqzkht6tieWYn/QxaWa64J X-Google-Smtp-Source: ABdhPJwQJd7HLn24uc/7CQWlMCpxYAuQTS9ZHWP+g795MK7AaTV/OaA/PTYIUkvbVeqy0uU0aOuhyg== X-Received: by 2002:a2e:8095:: with SMTP id i21mr11512992ljg.395.1618942790720; Tue, 20 Apr 2021 11:19:50 -0700 (PDT) Received: from ?IPv6:2a00:1370:812d:be19:6dbd:a91:f761:1766? ([2a00:1370:812d:be19:6dbd:a91:f761:1766]) by smtp.gmail.com with ESMTPSA id e11sm2268235ljk.128.2021.04.20.11.19.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 20 Apr 2021 11:19:50 -0700 (PDT) Subject: Re: Replacing disk strange (buggy?) behaviour - RAID1 To: me@jse.io, linux-btrfs@vger.kernel.org References: From: Andrei Borzenkov Message-ID: <9bdb8872-3394-534f-a9e3-11dcc5ea2819@gmail.com> Date: Tue, 20 Apr 2021 21:19:48 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 19.04.2021 18:22, Jonah Sabean wrote: > I'm running Ubuntu 21.04 (technically not a stable "release" yet, but > it will be in a few days, so if this is an ubuntu specific issue I'd > like to report it before it is!). > > The btrfs volume in question is two 8TB hard disks that were in RAID1 > at the time the filesystem was created. Kernel version is Ubuntu's > 5.11.0-14-generic with btrfs-progs version 5.10.1-1build1 in the > hirsute repos currently. This array is mostly non-changing archived > data, if that even matters. > > I replaced a missing disk (sda is the replacement disk) last night > while in a degraded mount (left it all night to complete) with `btrfs > replace start 1 /dev/sda1 /mnt/btrfs` (1 was the missing disk in btrfs > fi show) and it appears to have worked fine. However, when I ran > `btrfs fi usage` it returned: > > Overall: > Device size: 14.55TiB > Device allocated: 2.41TiB > Device unallocated: 12.14TiB > Device missing: 0.00B > Used: 1.60TiB > Free (estimated): 8.63TiB (min: 6.61TiB) > Free (statfs, df): 12.14TiB > Data ratio: 1.50 > Metadata ratio: 1.43 > Global reserve: 512.00MiB (used: 0.00B) > Multiple profiles: yes (data, metadata, system) > > Data,single: Size:820.00GiB, Used:3.25MiB (0.00%) > /dev/sdb1 820.00GiB > > Data,RAID1: Size:819.00GiB, Used:818.64GiB (99.96%) > /dev/sda1 819.00GiB > /dev/sdb1 819.00GiB > > Metadata,single: Size:4.00GiB, Used:864.00KiB (0.02%) > /dev/sdb1 4.00GiB > > Metadata,RAID1: Size:3.00GiB, Used:1.69GiB (56.23%) > /dev/sda1 3.00GiB > /dev/sdb1 3.00GiB > > System,single: Size:32.00MiB, Used:144.00KiB (0.44%) > /dev/sdb1 32.00MiB > > System,RAID1: Size:8.00MiB, Used:80.00KiB (0.98%) > /dev/sda1 8.00MiB > /dev/sdb1 8.00MiB > > Unallocated: > /dev/sda1 6.47TiB > /dev/sdb1 5.67TiB > > So a small amount of actual data and metadata was still single on the > disk I was rebuilding from (sdb), but it had massively allocated > "single" chunks in the process (relatively equal to what I had in > actual data), and to a lesser extent, metadata too. Mounting raid1 btrfs writable in degraded mode creates chunks with single profile. This is long standing issue. What is rather surprising that you apparently have chunk size 819GiB which is suspiciously close to 10% of 8TiB. btrfs indeed limits chunk size to 10% of total space, but it should not exceed 10GiB. Could it be specific Ubuntu issue? So when you wrote data in degraded mode it had to allocate new chunk with "single" profile. > Why didn't it free > those up as it replaced the missing disk and duplicated the data in > RAID1? Device replacement restored mirrored data (chunks with "raid1" profile) on the new device. It had no reasons to touch chunks with "single" profile because from btrfs point of view these chunks never had any data on replaced device so there is nothing to write there. > Shouldn't it all be RAID1 once it's complete, No. btrfs replace restores content of missing device. It is not replacement for profile conversion. > why even have > such small amounts remain single? Easy fix I thought, as at first > glance I didn't realize 800GiB was allocated single, only paying > attention to the small amounts used, so I did a soft convert to fix > this. > sudo btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /mnt/btrfs > > Convert was pretty quick... took just a few minutes, but of course now > it's all allocated just as raid1 now (with presumably 0 actual data in > most of them): Correct. To convert profile btrfs must allocate new chunks in new profile and copy data over. ... > > My questions are: > 1. Why did it have so much 'single' allocated chunks to begin with? It does not look like "chunks", rather it really looks like "chunk". Output of btrfs inspect-internal dump-tree -d /dev/xxx may be interesting. > Everything was RAID1 all up until the disk replacement, so it clearly > did this during the `btrfs replace` process. No, it did it during degraded writable mount. > Did I do this wrong, or > is there a bug? There is misfeature that btrfs creates "single" chunks during degraded mount. Ideally it should create degraded raid1 chunks. > 2. Would the btrfs replace have failed if the filesystem was more full > and those chunks were not possible to allocate (it basically allocated > double the amount of data I have after all, so if the fs was 50%+ > full...)? btrfs replace duplicates data that was on missing device. If you were able to write this data while device was present, btrfs replace cannot fail due to missing space (of course if device is at least as large). > 3. How do I prevent this from happening in the future, should I need > to replace a disk? Do not write anything in degraded mode. > Is this possibly an Ubuntu related issue (perhaps > how the btrfs progs is older relative to the kernel?). > > The 7GiB metadata isn't so bad, however I did proceed to run > btrfs balance start -dusage=0 /mnt/btrfs > > Is it possible to run balance with `-dusage=0` along with the convert > to do that all in one balance? Obviously, that doesn't solve the > actual issue to begin with, I'm just curious as I did it in two steps. > > FWIW: The `dusage=0` filter freed up pretty much everything as I > expected it to, and it looks pretty much identical to how it did > before the disk replacement: > Data,RAID1: Size:819.00GiB, Used:818.64GiB (99.96%) > /dev/sda1 819.00GiB > /dev/sdb1 819.00GiB > > I'm willing to do the process all over again as all this data is on > another system, I just would like assurance I don't run into this same > issue twice. > > Thanks, > -Jonah >