All of lore.kernel.org
 help / color / mirror / Atom feed
* CRC mismatch
@ 2018-10-16 15:30 Anton Shepelev
  2018-10-16 15:42 ` Austin S. Hemmelgarn
  2018-10-18 12:02 ` Anton Shepelev
  0 siblings, 2 replies; 7+ messages in thread
From: Anton Shepelev @ 2018-10-16 15:30 UTC (permalink / raw)
  To: linux-btrfs

Hello, all

What may be the reason of a CRC mismatch on a BTRFS file in
a virutal machine:

   csum failed ino 175524 off 1876295680 csum 451760558
   expected csum 1446289185

Shall I seek the culprit in the host machine on in the guest
one?  Supposing the host machine healty, what operations on
the gueest might have caused a CRC mismatch?

-- 
()  ascii ribbon campaign - against html e-mail
/\  http://preview.tinyurl.com/qcy6mjc [archived]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CRC mismatch
  2018-10-16 15:30 CRC mismatch Anton Shepelev
@ 2018-10-16 15:42 ` Austin S. Hemmelgarn
  2018-10-16 20:27   ` Chris Murphy
  2018-10-18 12:02 ` Anton Shepelev
  1 sibling, 1 reply; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2018-10-16 15:42 UTC (permalink / raw)
  To: Anton Shepelev, linux-btrfs

On 2018-10-16 11:30, Anton Shepelev wrote:
> Hello, all
> 
> What may be the reason of a CRC mismatch on a BTRFS file in
> a virutal machine:
> 
>     csum failed ino 175524 off 1876295680 csum 451760558
>     expected csum 1446289185
> 
> Shall I seek the culprit in the host machine on in the guest
> one?  Supposing the host machine healty, what operations on
> the gueest might have caused a CRC mismatch?
> 
Possible causes include:

* On the guest side:
   - Unclean shutdown of the guest system (not likely even if this did 
happen).
   - A kernel bug on in the guest.
   - Something directly modifying the block device (also not very likely).

* On the host side:
   - Unclean shutdown of the host system without properly flushing data 
from the guest.  Not likely unless you're using an actively unsafe 
caching mode for the guest's storage back-end.
   - At-rest data corruption in the storage back-end.
   - A bug in the host-side storage stack.
   - A transient error in the host-side storage stack.
   - A bug in the hypervisor.
   - Something directly modifying the back-end storage.

Of these, the statistically most likely location for the issue is 
probably the storage stack on the host.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CRC mismatch
  2018-10-16 15:42 ` Austin S. Hemmelgarn
@ 2018-10-16 20:27   ` Chris Murphy
  2018-10-17  9:15     ` Anton Shepelev
  2018-10-17 11:59     ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 7+ messages in thread
From: Chris Murphy @ 2018-10-16 20:27 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Anton Shepelev, Btrfs BTRFS

On Tue, Oct 16, 2018 at 9:42 AM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2018-10-16 11:30, Anton Shepelev wrote:
>>
>> Hello, all
>>
>> What may be the reason of a CRC mismatch on a BTRFS file in
>> a virutal machine:
>>
>>     csum failed ino 175524 off 1876295680 csum 451760558
>>     expected csum 1446289185
>>
>> Shall I seek the culprit in the host machine on in the guest
>> one?  Supposing the host machine healty, what operations on
>> the gueest might have caused a CRC mismatch?
>>
> Possible causes include:
>
> * On the guest side:
>   - Unclean shutdown of the guest system (not likely even if this did
> happen).
>   - A kernel bug on in the guest.
>   - Something directly modifying the block device (also not very likely).
>
> * On the host side:
>   - Unclean shutdown of the host system without properly flushing data from
> the guest.  Not likely unless you're using an actively unsafe caching mode
> for the guest's storage back-end.
>   - At-rest data corruption in the storage back-end.
>   - A bug in the host-side storage stack.
>   - A transient error in the host-side storage stack.
>   - A bug in the hypervisor.
>   - Something directly modifying the back-end storage.
>
> Of these, the statistically most likely location for the issue is probably
> the storage stack on the host.

Is there still that O_DIRECT related "bug" (or more of a limitation)
if the guest is using cache=none on the block device?

Anton what virtual machine tech are you using? qemu/kvm managed with
virt-manager? The configuration affects host behavior; but the
negative effect manifests inside the guest as corruption. If I
remember correctly.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CRC mismatch
  2018-10-16 20:27   ` Chris Murphy
@ 2018-10-17  9:15     ` Anton Shepelev
  2018-10-17 11:59     ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 7+ messages in thread
From: Anton Shepelev @ 2018-10-17  9:15 UTC (permalink / raw)
  To: linux-btrfs

[I accdientally replied to Chris instead of the mailing list]
Chris Murphy:

>Is there still that O_DIRECT related "bug" (or more of a
>limitation) if the guest is using cache=none on the block
>device?

I know nothing about it.

>Anton what virtual machine tech are you using?  qemu/kvm
>managed with virt-manager?  The configuration affects host
>behavior; but the negative effect manifests inside the
>guest as corruption.  If I remember correctly.

This is a commericial system run inside VMWare.

-- 
()  ascii ribbon campaign - against html e-mail
/\  http://preview.tinyurl.com/qcy6mjc [archived]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CRC mismatch
  2018-10-16 20:27   ` Chris Murphy
  2018-10-17  9:15     ` Anton Shepelev
@ 2018-10-17 11:59     ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2018-10-17 11:59 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Anton Shepelev, Btrfs BTRFS

On 2018-10-16 16:27, Chris Murphy wrote:
> On Tue, Oct 16, 2018 at 9:42 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2018-10-16 11:30, Anton Shepelev wrote:
>>>
>>> Hello, all
>>>
>>> What may be the reason of a CRC mismatch on a BTRFS file in
>>> a virutal machine:
>>>
>>>      csum failed ino 175524 off 1876295680 csum 451760558
>>>      expected csum 1446289185
>>>
>>> Shall I seek the culprit in the host machine on in the guest
>>> one?  Supposing the host machine healty, what operations on
>>> the gueest might have caused a CRC mismatch?
>>>
>> Possible causes include:
>>
>> * On the guest side:
>>    - Unclean shutdown of the guest system (not likely even if this did
>> happen).
>>    - A kernel bug on in the guest.
>>    - Something directly modifying the block device (also not very likely).
>>
>> * On the host side:
>>    - Unclean shutdown of the host system without properly flushing data from
>> the guest.  Not likely unless you're using an actively unsafe caching mode
>> for the guest's storage back-end.
>>    - At-rest data corruption in the storage back-end.
>>    - A bug in the host-side storage stack.
>>    - A transient error in the host-side storage stack.
>>    - A bug in the hypervisor.
>>    - Something directly modifying the back-end storage.
>>
>> Of these, the statistically most likely location for the issue is probably
>> the storage stack on the host.
> 
> Is there still that O_DIRECT related "bug" (or more of a limitation)
> if the guest is using cache=none on the block device?
I had actually forgotten about this, and I'm not quite sure if it's 
fixed or not.
> 
> Anton what virtual machine tech are you using? qemu/kvm managed with
> virt-manager? The configuration affects host behavior; but the
> negative effect manifests inside the guest as corruption. If I
> remember correctly.
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CRC mismatch
  2018-10-16 15:30 CRC mismatch Anton Shepelev
  2018-10-16 15:42 ` Austin S. Hemmelgarn
@ 2018-10-18 12:02 ` Anton Shepelev
  2018-10-18 12:34   ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 7+ messages in thread
From: Anton Shepelev @ 2018-10-18 12:02 UTC (permalink / raw)
  To: linux-btrfs

I wrote:

>What may be the reason of a CRC mismatch on a BTRFS file in
>a virutal machine:
>
>csum failed ino 175524 off 1876295680 csum 451760558
>expected csum 1446289185
>
>Shall I seek the culprit in the host machine on in the
>guest one?  Supposing the host machine healty, what
>operations on the gueest might have caused a CRC mismatch?

Thank you, Austin and Chris, for your replies.  While
describing the problem for the client, I tried again to copy
the corrupt file and this time it was copied without error,
which is of course scary because errors that miraculously
disappear may suddenly reappear in the same manner.

-- 
()  ascii ribbon campaign - against html e-mail
/\  http://preview.tinyurl.com/qcy6mjc [archived]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: CRC mismatch
  2018-10-18 12:02 ` Anton Shepelev
@ 2018-10-18 12:34   ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 7+ messages in thread
From: Austin S. Hemmelgarn @ 2018-10-18 12:34 UTC (permalink / raw)
  To: Anton Shepelev, linux-btrfs

On 18/10/2018 08.02, Anton Shepelev wrote:
> I wrote:
> 
>> What may be the reason of a CRC mismatch on a BTRFS file in
>> a virutal machine:
>>
>> csum failed ino 175524 off 1876295680 csum 451760558
>> expected csum 1446289185
>>
>> Shall I seek the culprit in the host machine on in the
>> guest one?  Supposing the host machine healty, what
>> operations on the gueest might have caused a CRC mismatch?
> 
> Thank you, Austin and Chris, for your replies.  While
> describing the problem for the client, I tried again to copy
> the corrupt file and this time it was copied without error,
> which is of course scary because errors that miraculously
> disappear may suddenly reappear in the same manner.
> 
If The filesystem was running some profile that supports repairs (pretty 
much, anything except single or raid0 profiles), then BTRFS will have 
fixed that particular block for you automatically.

Of course, the other possibility is that it was a transient error in the 
block layer that caused it tor return bogus data when the data that was 
on-disk was in fact correct.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-18 12:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-16 15:30 CRC mismatch Anton Shepelev
2018-10-16 15:42 ` Austin S. Hemmelgarn
2018-10-16 20:27   ` Chris Murphy
2018-10-17  9:15     ` Anton Shepelev
2018-10-17 11:59     ` Austin S. Hemmelgarn
2018-10-18 12:02 ` Anton Shepelev
2018-10-18 12:34   ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.