All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@inwind.it>
To: Christoph Anton Mitterer <calestyo@scientia.net>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Ongoing Btrfs stability issues
Date: Tue, 13 Mar 2018 20:36:25 +0100	[thread overview]
Message-ID: <d6e007af-7980-3d9b-a497-acb3be90dac9@inwind.it> (raw)
In-Reply-To: <1520891338.4266.16.camel@scientia.net>

On 03/12/2018 10:48 PM, Christoph Anton Mitterer wrote:
> On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote:
>> Unfortunately no, the likelihood might be 100%: there are some
>> patterns which trigger this problem quite easily. See The link which
>> I posted in my previous email. There was a program which creates a
>> bad checksum (in COW+DATASUM mode), and the file became unreadable.
> But that rather seems like a plain bug?!

You are right, unfortunately it seems that it is catalogate as WONT-FIX :(

> No reason that would conceptually make checksumming+notdatacow
> impossible.
> 
> AFAIU, the conceptual thin would be about:
> - data is written in nodatacow
>   => thus a checksum must be written as well, so write it
> - what can then of course happen is
>   - both csum and data are written => fine
>   - csum is written but data not and then some crash => csum will show
>     that => fine
>   - data is written but csum not and then some crash => csum will give
>     false positive
> 
> Still better few false positives, as many unnoticed data corruptions
> and no true raid repair.

A checksum mismatch, is returned as -EIO by a read() syscall. This is an event handled badly by most part of the programs. 
I.e. suppose that a page of a VM ram image file has a wrong checksum. When the VM starts, tries to read the page, got -EIO and aborts. It is even possible that it could not print which page is corrupted. In this case, how the user understand the problem, and what he could do ?


[....]

> 
>> Again, you are assuming that the likelihood of having a bad checksum
>> is low. Unfortunately this is not true. There are pattern which
>> exploits this bug with a likelihood=100%.
> 
> Okay I don't understand why this would be so and wouldn't assume that
> the IO pattern can affect it heavily... but I'm not really btrfs
> expert.
> 
> My blind assumption would have been that writing an extent of data
> takes much longer to complete than writing the corresponding checksum.

The problem is the following: there is a time window between the checksum computation and the writing the data on the disk (which is done at the lower level via a DMA channel), where if the data is update the checksum would mismatch. This happens if we have two threads, where the first commits the data on the disk, and the second one updates the data (I think that both VM and database could behave so).

In btrfs, a checksum mismatch creates an -EIO error during the reading. In a conventional filesystem (or a btrfs filesystem w/o datasum) there is no checksum, so this problem doesn't exist. 

I am curious how ZFS solves this problem.

However I have to point out that this problem is not solved by the COW. COW solved only the problem about an interrupted commit of the filesystem, where the data is update in place (so it is available by the user), but the metadata not.


> 
> Even if not... I should be only a problem in case of a crash during
> that,.. and than I'd still prefer to get the false positive than bad
> data.

How you can know if it is a "bad data" or a "bad checksum" ?


> 
> 
> Anyway... it's not going to happen so the discussion is pointless.
> I think people can probably use dm-integrity (which btw: does no CoW
> either (IIRC) and still can provide integrity... ;-) ) to see whether
> their data is valid.
> No nice but since it won't change on btrfs, a possible alternative.

Even in this case I am curious about dm-integrity would sole this issue.

> 
> 
> Cheers,
> Chris.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2018-03-13 19:36 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-15 16:18 Ongoing Btrfs stability issues Alex Adriaanse
2018-02-15 18:00 ` Nikolay Borisov
2018-02-15 19:41   ` Alex Adriaanse
2018-02-15 20:42     ` Nikolay Borisov
2018-02-16  4:54       ` Alex Adriaanse
2018-02-16  7:40         ` Nikolay Borisov
2018-02-16 19:44 ` Austin S. Hemmelgarn
2018-02-17  3:03   ` Duncan
2018-02-17  4:34     ` Shehbaz Jaffer
2018-02-17 15:18       ` Hans van Kranenburg
2018-02-17 16:42         ` Shehbaz Jaffer
2018-03-01 19:04   ` Alex Adriaanse
2018-03-01 19:40     ` Nikolay Borisov
2018-03-02 17:29       ` Liu Bo
2018-03-08 17:40         ` Alex Adriaanse
2018-03-09  9:54           ` Nikolay Borisov
2018-03-09 19:05             ` Alex Adriaanse
2018-03-10 12:04               ` Nikolay Borisov
2018-03-10 14:29                 ` Christoph Anton Mitterer
2018-03-11 17:51                   ` Goffredo Baroncelli
2018-03-11 22:37                     ` Christoph Anton Mitterer
2018-03-12 21:22                       ` Goffredo Baroncelli
2018-03-12 21:48                         ` Christoph Anton Mitterer
2018-03-13 19:36                           ` Goffredo Baroncelli [this message]
2018-03-13 20:10                             ` Christoph Anton Mitterer
2018-03-14 12:02                             ` Austin S. Hemmelgarn
2018-03-14 18:39                               ` Goffredo Baroncelli
2018-03-14 19:27                                 ` Austin S. Hemmelgarn
2018-03-14 22:17                                   ` Goffredo Baroncelli
2018-03-13 13:47               ` Patrik Lundquist
2018-03-02  4:02     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6e007af-7980-3d9b-a497-acb3be90dac9@inwind.it \
    --to=kreijack@inwind.it \
    --cc=calestyo@scientia.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.