All of lore.kernel.org
 help / color / mirror / Atom feed
From: Su Yue <suy.fnst@cn.fujitsu.com>
To: Marc MERLIN <marc@merlins.org>, Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: <linux-btrfs@vger.kernel.org>
Subject: Re: So, does btrfs check lowmem take days? weeks?
Date: Fri, 29 Jun 2018 14:02:19 +0800	[thread overview]
Message-ID: <02ba7ad4-b618-85f0-a99f-c43b25d367de@cn.fujitsu.com> (raw)
In-Reply-To: <20180629052825.tifg2aw7oy3qyyvw@merlins.org>



On 06/29/2018 01:28 PM, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote:
>>> lowmem repair seems to be going still, but it's been days and -p seems
>>> to do absolutely nothing.
>>
>> I'm a afraid you hit a bug in lowmem repair code.
>> By all means, --repair shouldn't really be used unless you're pretty
>> sure the problem is something btrfs check can handle.
>>
>> That's also why --repair is still marked as dangerous.
>> Especially when it's combined with experimental lowmem mode.
> 
> Understood, but btrfs got corrupted (by itself or not, I don't know)
> I cannot mount the filesystem read/write
> I cannot btrfs check --repair it since that code will kill my machine
> What do I have left?
> 
>>> My filesystem is "only" 10TB or so, albeit with a lot of files.
>>
>> Unless you have tons of snapshots and reflinked (deduped) files, it
>> shouldn't take so long.
> 
> I may have a fair amount.
> gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2
> enabling repair mode
> WARNING: low-memory mode repair support is only partial
> Checking filesystem on /dev/mapper/dshelf2
> UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> Fixed 0 roots.
> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, owner: 374857, offset: 3407872) wanted: 3, have: 4
> Created new chunk [18457780224000 1073741824]
> Delete backref in extent [84302495744 69632]
> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, owner: 374857, offset: 3407872) wanted: 3, have: 4
> Delete backref in extent [84302495744 69632]
> ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, owner: 374857, offset: 114540544) wanted: 181, have: 240
> Delete backref in extent [125712527360 12214272]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, owner: 374857, offset: 148234240) wanted: 302, have: 431
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, owner: 374857, offset: 148234240) wanted: 356, have: 433
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, owner: 374857, offset: 180371456) wanted: 161, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, owner: 374857, offset: 180371456) wanted: 162, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, owner: 374857, offset: 192200704) wanted: 170, have: 249
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, owner: 374857, offset: 192200704) wanted: 172, have: 251
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, owner: 374857, offset: 217653248) wanted: 348, have: 418
> Delete backref in extent [150850146304 17522688]
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, owner: 374857, offset: 235175936) wanted: 555, have: 1449
> Deleted root 2 item[156909494272, 178, 5476627808561673095]
> ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, owner: 374857, offset: 235175936) wanted: 556, have: 1452
> Deleted root 2 item[156909494272, 178, 7338474132555182983]
> ERROR: file extent[374857 235184128] root 21872 owner 21872 backref lost
> Add one extent data backref [156909494272 55320576]
> ERROR: file extent[374857 235184128] root 22911 owner 22911 backref lost
> Add one extent data backref [156909494272 55320576]
> 
My bad.
It's almost possiblelly a bug about extent of lowmem check which
was reported by Chris too.
The extent check was wrong, the the repair did wrong things.

I have figured out the bug is lowmem check can't deal with shared tree 
block in reloc tree. The fix is simple, you can try the follow repo:

https://github.com/Damenly/btrfs-progs/tree/tmp1

Please run lowmem check "without =--repair" first to be sure whether
your filesystem is fine.

Though the bug and phenomenon are clear enough, before sending my patch,
I have to make a test image. I have spent a week to study btrfs balance
but it seems a liitle hard for me.

Thanks,
Su

> The last two ERROR lines took over a day to get generated, so I'm not sure if it's still working, but just slowly.
> For what it's worth non lowmem check used to take 12 to 24H on that filesystem back when it still worked.
> 
>>> 2 things that come to mind
>>> 1) can lowmem have some progress working so that I know if I'm looking
>>> at days, weeks, or even months before it will be done?
>>
>> It's hard to estimate, especially when every cross check involves a lot
>> of disk IO.
>> But at least, we could add such indicator to show we're doing something.
> 
> Yes, anything to show that I should still wait is still good :)
> 
>>> 2) non lowmem is more efficient obviously when it doesn't completely
>>> crash your machine, but could lowmem be given an amount of memory to use
>>> for caching, or maybe use some heuristics based on RAM free so that it's
>>> not so excrutiatingly slow?
>>
>> IIRC recent commit has added the ability.
>> a5ce5d219822 ("btrfs-progs: extent-cache: actually cache extent buffers")
>   
> Oh, good.
> 
>> That's already included in btrfs-progs v4.13.2.
>> So it should be a dead loop which lowmem repair code can't handle.
> 
> I see. Is there any reasonably easy way to check on this running process?
> 
> Both top and iotop show that it's working, but of course I can't tell if
> it's looping, or not.
> 
> Then again, maybe it already fixed enough that I can mount my filesystem again.
> 
> But back to the main point, it's sad that after so many years, the
> repair situation is still so suboptimal, especially when it's apparently
> pretty easy for btrfs to get damaged (through its own fault or not, hard
> to say).
> 
> Thanks,
> Marc
> 



  parent reply	other threads:[~2018-06-29  5:56 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29  4:27 So, does btrfs check lowmem take days? weeks? Marc MERLIN
2018-06-29  5:07 ` Qu Wenruo
2018-06-29  5:28   ` Marc MERLIN
2018-06-29  5:48     ` Qu Wenruo
2018-06-29  6:06       ` Marc MERLIN
2018-06-29  6:29         ` Qu Wenruo
2018-06-29  6:59           ` Marc MERLIN
2018-06-29  7:09             ` Roman Mamedov
2018-06-29  7:22               ` Marc MERLIN
2018-06-29  7:34                 ` Roman Mamedov
2018-06-29  8:04                 ` Lionel Bouton
2018-06-29 16:24                   ` btrfs send/receive vs rsync Marc MERLIN
2018-06-30  8:18                     ` Duncan
2018-06-29  7:20             ` So, does btrfs check lowmem take days? weeks? Qu Wenruo
2018-06-29  7:28               ` Marc MERLIN
2018-06-29 17:10                 ` Marc MERLIN
2018-06-30  0:04                   ` Chris Murphy
2018-06-30  2:44                   ` Marc MERLIN
2018-06-30 14:49                     ` Qu Wenruo
2018-06-30 21:06                       ` Marc MERLIN
2018-06-29  6:02     ` Su Yue [this message]
2018-06-29  6:10       ` Marc MERLIN
2018-06-29  6:32         ` Su Yue
2018-06-29  6:43           ` Marc MERLIN
2018-07-01 23:22             ` Marc MERLIN
2018-07-02  2:02               ` Su Yue
2018-07-02  3:22                 ` Marc MERLIN
2018-07-02  6:22                   ` Su Yue
2018-07-02 14:05                     ` Marc MERLIN
2018-07-02 14:42                       ` Qu Wenruo
2018-07-02 15:18                         ` how to best segment a big block device in resizeable btrfs filesystems? Marc MERLIN
2018-07-02 16:59                           ` Austin S. Hemmelgarn
2018-07-02 17:34                             ` Marc MERLIN
2018-07-02 18:35                               ` Austin S. Hemmelgarn
2018-07-02 19:40                                 ` Marc MERLIN
2018-07-03  4:25                                 ` Andrei Borzenkov
2018-07-03  7:15                                   ` Duncan
2018-07-06  4:28                                     ` Andrei Borzenkov
2018-07-08  8:05                                       ` Duncan
2018-07-03  0:51                           ` Paul Jones
2018-07-03  4:06                             ` Marc MERLIN
2018-07-03  4:26                               ` Paul Jones
2018-07-03  5:42                                 ` Marc MERLIN
2018-07-03  1:37                           ` Qu Wenruo
2018-07-03  4:15                             ` Marc MERLIN
2018-07-03  9:55                               ` Paul Jones
2018-07-03 11:29                                 ` Qu Wenruo
2018-07-03  4:23                             ` Andrei Borzenkov
2018-07-02 15:19                         ` So, does btrfs check lowmem take days? weeks? Marc MERLIN
2018-07-02 17:08                           ` Austin S. Hemmelgarn
2018-07-02 17:33                           ` Roman Mamedov
2018-07-02 17:39                             ` Marc MERLIN
2018-07-03  0:31                         ` Chris Murphy
2018-07-03  4:22                           ` Marc MERLIN
2018-07-03  8:34                             ` Su Yue
2018-07-03 21:34                               ` Chris Murphy
2018-07-03 21:40                                 ` Marc MERLIN
2018-07-04  1:37                                   ` Su Yue
2018-07-03  8:50                             ` Qu Wenruo
2018-07-03 14:38                               ` Marc MERLIN
2018-07-03 21:46                               ` Chris Murphy
2018-07-03 22:00                                 ` Marc MERLIN
2018-07-03 22:52                                   ` Qu Wenruo
2018-06-29  5:35   ` Su Yue
2018-06-29  5:46     ` Marc MERLIN
     [not found] <94caf6c5-77e1-3da0-d026-a29edb08d410@cn.fujitsu.com>
     [not found] ` <CAKhhfD6svMo=28_UX=ZjRRmF6zNadd3H+8vVZKGX4zjqVr-giw@mail.gmail.com>
     [not found]   ` <3a83cb3c-de2b-e803-f07e-31f7de0ee25f@cn.fujitsu.com>
     [not found]     ` <b1b2d361-eb1a-f172-45d3-409abd131d2b@cn.fujitsu.com>
     [not found]       ` <20180705153023.GA30566@merlins.org>
     [not found]         ` <trinity-d028b6bd-31d9-41c0-a091-47bcb810cdc3-1530808069711@msvc-mesg-gmx023>
     [not found]           ` <20180705165049.t56dvqpz7ljjan5c@merlins.org>
     [not found]             ` <trinity-79578bdf-a849-4342-a082-f2b882f2251e-1530810500266@msvc-mesg-gmx024>
     [not found]               ` <20180706160523.kxwxjzwneseaamnt@merlins.org>
     [not found]                 ` <20180706175636.53ebp7drifiqu5b7@merlins.org>
     [not found]                   ` <20180707172114.bfc26eoahullffgg@merlins.org>
2018-07-10  1:37                     ` Su Yue
2018-07-10  1:34                       ` Qu Wenruo
2018-07-10  3:50                         ` Marc MERLIN
2018-07-10  4:55                           ` Qu Wenruo
2018-07-10 10:44                             ` Su Yue
     [not found] <f9bc21d6-fdc3-ca3a-793f-6fe574c7b8c6@cn.fujitsu.com>
     [not found] ` <20180709031054.qfg4x5yzcl4rao2k@merlins.org>
     [not found]   ` <20180709031501.iutlokfvodtkkfhe@merlins.org>
     [not found]     ` <17cc0cc1-b64d-4daa-18b5-bb2da3736ea1@cn.fujitsu.com>
     [not found]       ` <20180709034058.wjavwjdyixx6smbw@merlins.org>
     [not found]         ` <29302c14-e277-2c69-ac08-c4722c2b18aa@cn.fujitsu.com>
     [not found]           ` <20180709155306.zr3p2kolnanvkpny@merlins.org>
     [not found]             ` <trinity-4aae1c42-a85e-4c73-a30e-8b0d0be05e86-1531152875875@msvc-mesg-gmx023>
     [not found]               ` <20180709174818.wq2d4awmgasxgwad@merlins.org>
     [not found]                 ` <faba0923-8d1f-5270-ba03-ce9cc484e08a@gmx.com>
2018-07-10  4:00                   ` Marc MERLIN
     [not found]                 ` <trinity-4546309e-d603-4d29-885a-e76da594f792-1531159860064@msvc-mesg-gmx021>
     [not found]                   ` <20180709222218.GP9859@merlins.org>
     [not found]                     ` <440b7d12-3504-8b4f-5aa4-b1f39f549730@cn.fujitsu.com>
     [not found]                       ` <20180710041037.4ynitx3flubtwtvc@merlins.org>
     [not found]                         ` <58b36f04-3094-7de0-8d5e-e06e280aac00@cn.fujitsu.com>
2018-07-11  1:08                           ` Su Yue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02ba7ad4-b618-85f0-a99f-c43b25d367de@cn.fujitsu.com \
    --to=suy.fnst@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=marc@merlins.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.