All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: dsterba@suse.cz, Qu Wenruo <wqu@suse.de>,
	Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
Date: Tue, 17 Jul 2018 21:32:27 +0800	[thread overview]
Message-ID: <1e42f435-c769-015f-8039-0d68a5d8e1af@gmx.com> (raw)
In-Reply-To: <20180717123329.GX3126@twin.jikos.cz>


[-- Attachment #1.1: Type: text/plain, Size: 4476 bytes --]



On 2018年07月17日 20:33, David Sterba wrote:
> On Mon, Jul 16, 2018 at 09:57:43PM +0800, Qu Wenruo wrote:
>>>>> -EUCLEAN ?
>>>>
>>>> Either works for me.
>>>
>>> That's not just a cosmetic change, there's a semantic difference between
>>> the error codes, I maybe make that more explicit and not expect that this
>>> is obvious.
>>>
>>> ENOENT does not make much sense in this context, the caller (mount in
>>> this case) cannot do anything about a code that says 'some internal
>>> structure not found'.
>>
>> The point here is, if every self-checker should only return -EUCLEAN, it
>> won't really indicate what's going wrong, except points to some
>> self-checker (and such self-checkers are growing larger than our
>> expectation already).
>>
>> My practice here is, put some human readable and meaningful error
>> message. No matter what we choose to return, the error message should
>> tell us what's going wrong.
>>
>> In this case, I don't really care the return value. If it's explicitly
>> needed to return -EUCLEAN, I could make all existing checker (from
>> tree-checker to chunk/bg/dev-extent checker) to return -EUCLEAN if
>> anything is wrong (and save several "ret = -EUCLEAN" lines).
>> The return value doesn't really have much meaning nowadays, it's the
>> error message important now.
> 
> Ok, I see what you mean. The message is important as it's otherwise
> almost impossible to find where exactly the mount fails.
> 
> The error messages perhaps fall into several categories:
> 
> 1) transient errors, some failure that happens before the filesystem state
>    is fully examined
> 
> this is namely ENOMEM, or EINTR eg. returned by kthread_run

This standard is a little misleading, or did I misunderstand your category?

From the example error number, I could only find ENOMEM so
straightforward for end user/developer that we don't need any error
message to explain them.
Or this category is just for error no need of error message? (or can be
handled by btrfs-progs without any need of user interruption/decision?)

> 
> maybe also a failure on a multi-device filesystem when the devices
> haven't been scanned yet
> 
> 2) clearly some corruption/consistency condtion, with enough information
>    available to decide
> 
> like a missing tree, most of the tree-checker would fall into this
> category

This is pretty clear.

> 
> 3) same as the previous one, but there's some external condition preventing
>    a full check
> 
> that's eg. a real EIO after reading a tree block

That csum mismatch EIO with error message or really some error from
underlying layer like some ATA command failure?

> 
> 
> The error code are IMO important to see how severe the problems are and
> what's the expected solution. 2 is for 'check', 3 may need degraded
> mount, 1 needs maybe more time to mount again.

Category 2 for check is sure.
For other 2 cases it's a little hard to say.
Normally if we really hit some error we don't expect, under most case
the filesystem is already corrupted (e.g. a lot of errors of resuming
balance/mount failure finally turns out to be fs corruption).

If category is determined by the expected solution, most will just fall
into category 2), including most of errors we have in btrfs module
currently.

> 
> With the error messages in place, 2 can be completely covered by
> EUCLEAN. I briefly skimmed a few call paths and think that the 3
> categories should be enough, but I'm also expecting some exceptions that
> can be decided case by case.

For category 2), I think it's pretty clear and practically to use EUCLEAN.

For other categories I'm not really sure.

E.G what happens if we can't find certain backref when running delayed
refs? It's either a kernel bug or a corrupted fs.
Which category should it fit? Category 2? But we don't really know
what's going wrong.
For category 1/3? It won't really be fixed until we fix the bug or the fs.

More details examples would definitely help me understand the category.

> 
> The error codes are now not consistent, lots of EUCLEAN are historically
> EIO, but before we start cleaning that up we should have at least some
> guidelines. Please let me know what you think.
> 
At least for self-verification code it's pretty clear that we should
have error message for what's going wrong and what we expect, with
explicit EUCLEAN error number.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2018-07-17 14:05 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-03  9:10 [PATCH 0/5] Enhancement for block group/chunk verification Qu Wenruo
2018-07-03  9:10 ` [PATCH 1/5] btrfs: tree-checker: Verify block_group_item Qu Wenruo
2018-07-04  2:20   ` Gu, Jinxiang
2018-07-04  5:54   ` Nikolay Borisov
2018-07-04  7:37   ` Gu, Jinxiang
2018-07-03  9:10 ` [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree Qu Wenruo
2018-07-04  3:42   ` Gu, Jinxiang
2018-07-04  5:56   ` Nikolay Borisov
2018-07-04  7:37   ` Gu, Jinxiang
2018-07-03  9:10 ` [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized Qu Wenruo
2018-07-04  5:23   ` Gu, Jinxiang
2018-07-04  7:37   ` Gu, Jinxiang
2018-07-03  9:10 ` [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
2018-07-04  5:45   ` Gu, Jinxiang
2018-07-05 23:41     ` Qu Wenruo
2018-07-04  6:02   ` Nikolay Borisov
2018-07-03  9:10 ` [PATCH 5/5] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
2018-07-04  6:09   ` Gu, Jinxiang
2018-07-04  7:08   ` Nikolay Borisov
2018-07-04  9:46     ` Qu Wenruo
2018-07-05 23:49       ` Qu Wenruo
2018-07-05 15:18   ` David Sterba
2018-07-05 23:44     ` Qu Wenruo
2018-07-16 13:16       ` David Sterba
2018-07-16 13:57         ` Qu Wenruo
2018-07-17 12:33           ` David Sterba
2018-07-17 13:32             ` Qu Wenruo [this message]
2018-07-19 14:22               ` David Sterba
2018-07-04 13:36 ` [PATCH 0/5] Enhancement for block group/chunk verification David Sterba
2018-07-05  1:36   ` Qu Wenruo
2018-07-05 15:18     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e42f435-c769-015f-8039-0d68a5d8e1af@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    --cc=wqu@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.