All of lore.kernel.org
 help / color / mirror / Atom feed
* uknown issues - different sha256 hash - files corruption
@ 2016-01-24 22:00 John Smith
  2016-01-25  0:06 ` Patrik Lundquist
  2016-01-25  0:28 ` Duncan
  0 siblings, 2 replies; 17+ messages in thread
From: John Smith @ 2016-01-24 22:00 UTC (permalink / raw)
  To: linux-btrfs

Dear,

I have cubox-i4, running debian with 4.4 kernel. The icy box
IB-3664SU3 enclosure is attached into cubox using esata port,
enclosure uses JM393 and JM539 chipsets.

I use btrfs volume in raid0 created from the two drives, and lvm ext4
volume that contains two drives also. When I copy (using rsync) big
file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
differs.

I did 2 tests, copy the source file from ext4 to btrfs, count sha256
hash, each time the destination file on btrfs has different hash
compared to the source file located on ext4 and even hashes from both
runs of target files on btrfs differs.

I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
The snapshot of the result is here http://paste.debian.net/367678/,
the is so many bytes with differences. The size of the source and
target file is exactly the same.


I also copied around 600GB of data set that contains small files,
music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
- all was fine.

Any idea what can cause that issue or how can i debug it in more detail?


Thank you!

Best,
John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-24 22:00 uknown issues - different sha256 hash - files corruption John Smith
@ 2016-01-25  0:06 ` Patrik Lundquist
       [not found]   ` <CAAcrkY+3w--OGYWwben+KYohdqwBBryDn8REJ6tiBk4jM3Tp9w@mail.gmail.com>
  2016-01-25  0:28 ` Duncan
  1 sibling, 1 reply; 17+ messages in thread
From: Patrik Lundquist @ 2016-01-25  0:06 UTC (permalink / raw)
  To: John Smith; +Cc: linux-btrfs

On 24 January 2016 at 23:00, John Smith <lenovomi@gmail.com> wrote:
>
> Dear,
>
> I have cubox-i4, running debian with 4.4 kernel. The icy box
> IB-3664SU3 enclosure is attached into cubox using esata port,
> enclosure uses JM393 and JM539 chipsets.
>
> I use btrfs volume in raid0 created from the two drives, and lvm ext4
> volume that contains two drives also. When I copy (using rsync) big
> file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
> differs.
>
> I did 2 tests, copy the source file from ext4 to btrfs, count sha256
> hash, each time the destination file on btrfs has different hash
> compared to the source file located on ext4 and even hashes from both
> runs of target files on btrfs differs.
>
> I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
> The snapshot of the result is here http://paste.debian.net/367678/,
> the is so many bytes with differences. The size of the source and
> target file is exactly the same.
>
>
> I also copied around 600GB of data set that contains small files,
> music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
> - all was fine.
>
> Any idea what can cause that issue or how can i debug it in more detail?

The data must have become corrupted before it was written to the btrfs
volume, since you can read it back without data checksum errors.

Try copying the big file a couple of times but from btrfs to ext4 to
see if you get data checksum errors.

Run memcheck and long SMART tests on the disks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-24 22:00 uknown issues - different sha256 hash - files corruption John Smith
  2016-01-25  0:06 ` Patrik Lundquist
@ 2016-01-25  0:28 ` Duncan
  2016-01-25  0:48   ` Chris Murphy
  1 sibling, 1 reply; 17+ messages in thread
From: Duncan @ 2016-01-25  0:28 UTC (permalink / raw)
  To: linux-btrfs

John Smith posted on Sun, 24 Jan 2016 23:00:55 +0100 as excerpted:

> Dear,
> 
> I have cubox-i4, running debian with 4.4 kernel. The icy box
> IB-3664SU3 enclosure is attached into cubox using esata port,
> enclosure uses JM393 and JM539 chipsets.
> 
> I use btrfs volume in raid0 created from the two drives, and lvm ext4
> volume that contains two drives also. When I copy (using rsync) big
> file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
> differs.
> 
> I did 2 tests, copy the source file from ext4 to btrfs, count sha256
> hash, each time the destination file on btrfs has different hash
> compared to the source file located on ext4 and even hashes from both
> runs of target files on btrfs differs.
> 
> I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
> The snapshot of the result is here http://paste.debian.net/367678/,
> the is so many bytes with differences. The size of the source and
> target file is exactly the same.
> 
> 
> I also copied around 600GB of data set that contains small files,
> music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
> - all was fine.
> 
> Any idea what can cause that issue or how can i debug it in more detail?

My immediate first question is what happens if you do another lvm ext4 on 
the the two devices you're creating the btrfs on?  Does the file sha256 
the same in that case?

Second question, have you run badblocks on the devices in question, and 
what's their smart status (smartctl -A)?

Does repeatedly rsyncing the same file over itself trigger different 
sha256 hashes each time?  Does that result in more hexdump diffs or 
fewer, and do they occur at roughly the same spots in the file or do they 
move around?

What about copying the same file twice (to different subdirs or 
something), so it exists twice on the destination device?  Does that 
change where the diffs occur and do the two copies on the same btrfs 
differ (presumably yes, since copying it twice yielded different hashes).

What about copies from the btrfs to somewhere else on the same btrfs 
(being sure to actually copy the data, not create reflinks)?  Do both 
copies then have the same hash or does it change yet again, and if so, 
are the diffs in the same place or not?

And does an overnite memtest run come up good or not?

The interesting thing with the linked hexdump diff is that its only 38 
bytes different, and they're all in a single 39-byte sequence (there's 
apparently one byte that's the same in the 39 bytes, ...435, so only 38 
bytes different), at just over 38 GB, between 35 and 36 GiB, into the 
file.  That's not on a nice, even boundary and doesn't reoccur say every 
36 GiB or something, so the problem is unlikely to be a block offset 
issue.

It could be bad blocks on the devices in question or bad ESATA 
connections to them, but ordinarily, btrfs would catch that due to its 
own checksumming, and would fail the file read at the bad block, which it 
isn't doing here.  That would tend to indicate that btrfs is saving and 
returning exactly what it was given in the first place, and that the data 
was bad by the time btrfs got it.

But it could be bad memory or a faulty network issue, such that the data 
is already bad by the time btrfs gets it, so it checksums already bad 
data and faithfully returns what it got, but what it got was already bad.

If it's bad memory, then local btrfs to btrfs copies should show random 
differences as well.  If it's a bad network, then local copies should be 
fine, but transfer over the network to ext4 on lvm should turn up random 
differences.


Meanwhile, cubox-i4 means little to me, but FWIW google says freescale 
iMX6 CPU.  But the evidence so far isn't pointing to an arch-specific 
bug.    

I did see, however, a footnote to the effect that while the network port 
is gigabit Ethernet, it's hardware limited to 400-something megabit due 
to bus size and speed on the cubox.  If indicators point to the network 
as being at fault, you might try manually setting it to 100 megabit 
Ethernet instead of gigabit.  That will likely throttle things down far 
enough stabilize things.  Given the evidence so far, I'd put the chance 
of it being network-transfer corruption at 80% or better, and if so, I'd 
give manually setting 100 megabit speed around a 90% chance of fixing it.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-25  0:28 ` Duncan
@ 2016-01-25  0:48   ` Chris Murphy
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2016-01-25  0:48 UTC (permalink / raw)
  To: Btrfs BTRFS

re: network induced corruption

Some hints should be possible with either

# ip link
# ip -s link show <specific link>

or

# ifconfig

It should show packet errors. Check this on both sides of the network
connection. It seems kinda unlikely to me that a lot of data can go
through two NICs, have this much corruption, and not have some
indication there are network problems.


Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
       [not found]   ` <CAAcrkY+3w--OGYWwben+KYohdqwBBryDn8REJ6tiBk4jM3Tp9w@mail.gmail.com>
@ 2016-01-25  9:03     ` Patrik Lundquist
  2016-01-25 16:53       ` John Smith
  0 siblings, 1 reply; 17+ messages in thread
From: Patrik Lundquist @ 2016-01-25  9:03 UTC (permalink / raw)
  To: John Smith; +Cc: linux-btrfs

On 25 January 2016 at 01:12, John Smith <lenovomi@gmail.com> wrote:
>
> Hello,
>
> what else/ or where it was corrupted?

It got corrupted after leaving the source disk and before btrfs
calculated the data checksum during file write.


> I didnt check data checksum
> errors - is it possible with some btrfs tools? But yes I can read
> whole file after stored on btrfs.

You wouldn't have been able to read the whole corrupted file from
btrfs if the corruption took place after the file was written (due to
wrong checksum).


> By checksum errors, do you mean sha256 hash?

No, I mean the built-in data checksum in btrfs that guarantees file integrity.


> I plan to run memcheck but on the irc i was suggested that it is not
> RAM issue, based on the output from the cmp.

Perhaps not, but you have to rule that out. Leave the memtest overnight.


> The drives are brand new
> and badblocks + smart test was executed with no errors.

That's great.


> On Mon, Jan 25, 2016 at 1:06 AM, Patrik Lundquist
> <patrik.lundquist@gmail.com> wrote:
> > On 24 January 2016 at 23:00, John Smith <lenovomi@gmail.com> wrote:
> >>
> >> Dear,
> >>
> >> I have cubox-i4, running debian with 4.4 kernel. The icy box
> >> IB-3664SU3 enclosure is attached into cubox using esata port,
> >> enclosure uses JM393 and JM539 chipsets.
> >>
> >> I use btrfs volume in raid0 created from the two drives, and lvm ext4
> >> volume that contains two drives also. When I copy (using rsync) big
> >> file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
> >> differs.
> >>
> >> I did 2 tests, copy the source file from ext4 to btrfs, count sha256
> >> hash, each time the destination file on btrfs has different hash
> >> compared to the source file located on ext4 and even hashes from both
> >> runs of target files on btrfs differs.
> >>
> >> I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
> >> The snapshot of the result is here http://paste.debian.net/367678/,
> >> the is so many bytes with differences. The size of the source and
> >> target file is exactly the same.
> >>
> >>
> >> I also copied around 600GB of data set that contains small files,
> >> music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
> >> - all was fine.
> >>
> >> Any idea what can cause that issue or how can i debug it in more detail?
> >
> > The data must have become corrupted before it was written to the btrfs
> > volume, since you can read it back without data checksum errors.
> >
> > Try copying the big file a couple of times but from btrfs to ext4 to
> > see if you get data checksum errors.
> >
> > Run memcheck and long SMART tests on the disks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-25  9:03     ` Patrik Lundquist
@ 2016-01-25 16:53       ` John Smith
  2016-01-25 22:02         ` Henk Slager
  0 siblings, 1 reply; 17+ messages in thread
From: John Smith @ 2016-01-25 16:53 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs

Hello all,

there is no network involvement in that copy process. Esata enclosure
is attached directly to cuboxi and copy is between 2 sata drives in
lvm (using ext4) and two btrfs drives in raid1.


When i copy data from lvm to btrfs, hash of the file on btrfs is
different compare to the one on lvm. When I copy exactly same file
multiple times all the time it got different hash on btrfs.

I did a test, i copied file from btrfs to lvm and both hashes are same.


When can intervent / mess up data when i do copy between lvm to btrfs?

All drives are brand new, badblocks was executed on each drive, also
smart doesnt shows up any issues.



thank you

On Mon, Jan 25, 2016 at 10:03 AM, Patrik Lundquist
<patrik.lundquist@gmail.com> wrote:
> On 25 January 2016 at 01:12, John Smith <lenovomi@gmail.com> wrote:
>>
>> Hello,
>>
>> what else/ or where it was corrupted?
>
> It got corrupted after leaving the source disk and before btrfs
> calculated the data checksum during file write.
>
>
>> I didnt check data checksum
>> errors - is it possible with some btrfs tools? But yes I can read
>> whole file after stored on btrfs.
>
> You wouldn't have been able to read the whole corrupted file from
> btrfs if the corruption took place after the file was written (due to
> wrong checksum).
>
>
>> By checksum errors, do you mean sha256 hash?
>
> No, I mean the built-in data checksum in btrfs that guarantees file integrity.
>
>
>> I plan to run memcheck but on the irc i was suggested that it is not
>> RAM issue, based on the output from the cmp.
>
> Perhaps not, but you have to rule that out. Leave the memtest overnight.
>
>
>> The drives are brand new
>> and badblocks + smart test was executed with no errors.
>
> That's great.
>
>
>> On Mon, Jan 25, 2016 at 1:06 AM, Patrik Lundquist
>> <patrik.lundquist@gmail.com> wrote:
>> > On 24 January 2016 at 23:00, John Smith <lenovomi@gmail.com> wrote:
>> >>
>> >> Dear,
>> >>
>> >> I have cubox-i4, running debian with 4.4 kernel. The icy box
>> >> IB-3664SU3 enclosure is attached into cubox using esata port,
>> >> enclosure uses JM393 and JM539 chipsets.
>> >>
>> >> I use btrfs volume in raid0 created from the two drives, and lvm ext4
>> >> volume that contains two drives also. When I copy (using rsync) big
>> >> file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
>> >> differs.
>> >>
>> >> I did 2 tests, copy the source file from ext4 to btrfs, count sha256
>> >> hash, each time the destination file on btrfs has different hash
>> >> compared to the source file located on ext4 and even hashes from both
>> >> runs of target files on btrfs differs.
>> >>
>> >> I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
>> >> The snapshot of the result is here http://paste.debian.net/367678/,
>> >> the is so many bytes with differences. The size of the source and
>> >> target file is exactly the same.
>> >>
>> >>
>> >> I also copied around 600GB of data set that contains small files,
>> >> music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
>> >> - all was fine.
>> >>
>> >> Any idea what can cause that issue or how can i debug it in more detail?
>> >
>> > The data must have become corrupted before it was written to the btrfs
>> > volume, since you can read it back without data checksum errors.
>> >
>> > Try copying the big file a couple of times but from btrfs to ext4 to
>> > see if you get data checksum errors.
>> >
>> > Run memcheck and long SMART tests on the disks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-25 16:53       ` John Smith
@ 2016-01-25 22:02         ` Henk Slager
  2016-01-26  0:15           ` John Smith
  2016-01-26 11:54           ` John Smith
  0 siblings, 2 replies; 17+ messages in thread
From: Henk Slager @ 2016-01-25 22:02 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Jan 25, 2016 at 5:53 PM, John Smith <lenovomi@gmail.com> wrote:
> Hello all,
>
> there is no network involvement in that copy process. Esata enclosure
> is attached directly to cuboxi and copy is between 2 sata drives in
> lvm (using ext4) and two btrfs drives in raid1.

I thought it was raid0, anyhow it seems not relevant, it just a btrfs filesystem

> When i copy data from lvm to btrfs, hash of the file on btrfs is
> different compare to the one on lvm. When I copy exactly same file
> multiple times all the time it got different hash on btrfs.
>
> I did a test, i copied file from btrfs to lvm and both hashes are same.
>
>
> When can intervent / mess up data when i do copy between lvm to btrfs?

My first thought was that it could due to that you are using SATA port
multiplexing. But then you would likely experience some corruption of
the btrfs fs, although it might take some time (days/weeks) before you
discover. You could check dmesg to see what is there for ataX.Y.
A mitigation would be to do the same copy tests over USB, although
port-multiplexing might still be effective, I don't know that.

It could also be that the cubox with its current firmware+software
fails under certain loads (btrfs writing is quite different from
ext4), it might be that it is just your cubox hardware, maybe power
issues or whatever. Or some btrfs/other piece of code overwrites the
rsync/btrfs write buffers before crc every now and then. Or just
memory errors as already suggested.

>
> All drives are brand new, badblocks was executed on each drive, also
> smart doesnt shows up any issues.

The drives are not the issue I think. I would temporary replace the
cubox with a typical x86_64 system with kernel v4.4 from kernel.org,
connect the icy box via eSATA port (or USB if you don't have) and
execute the copy tests lvm -> btrfs. From there you can see if it is
btrfs on the cubox and maybe then just connect a single SATA disk to
the cubox and repeat tests and maybe try a bit older kernel (like
3.18) as well.

And what if you do If this several times on cubox and PC?
# dd if=/dev/zero of=<fileonbtrfs> bs=1M count=130000

> thank you
>
> On Mon, Jan 25, 2016 at 10:03 AM, Patrik Lundquist
> <patrik.lundquist@gmail.com> wrote:
>> On 25 January 2016 at 01:12, John Smith <lenovomi@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> what else/ or where it was corrupted?
>>
>> It got corrupted after leaving the source disk and before btrfs
>> calculated the data checksum during file write.
>>
>>
>>> I didnt check data checksum
>>> errors - is it possible with some btrfs tools? But yes I can read
>>> whole file after stored on btrfs.
>>
>> You wouldn't have been able to read the whole corrupted file from
>> btrfs if the corruption took place after the file was written (due to
>> wrong checksum).
>>
>>
>>> By checksum errors, do you mean sha256 hash?
>>
>> No, I mean the built-in data checksum in btrfs that guarantees file integrity.
>>
>>
>>> I plan to run memcheck but on the irc i was suggested that it is not
>>> RAM issue, based on the output from the cmp.
>>
>> Perhaps not, but you have to rule that out. Leave the memtest overnight.
>>
>>
>>> The drives are brand new
>>> and badblocks + smart test was executed with no errors.
>>
>> That's great.
>>
>>
>>> On Mon, Jan 25, 2016 at 1:06 AM, Patrik Lundquist
>>> <patrik.lundquist@gmail.com> wrote:
>>> > On 24 January 2016 at 23:00, John Smith <lenovomi@gmail.com> wrote:
>>> >>
>>> >> Dear,
>>> >>
>>> >> I have cubox-i4, running debian with 4.4 kernel. The icy box
>>> >> IB-3664SU3 enclosure is attached into cubox using esata port,
>>> >> enclosure uses JM393 and JM539 chipsets.
>>> >>
>>> >> I use btrfs volume in raid0 created from the two drives, and lvm ext4
>>> >> volume that contains two drives also. When I copy (using rsync) big
>>> >> file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
>>> >> differs.
>>> >>
>>> >> I did 2 tests, copy the source file from ext4 to btrfs, count sha256
>>> >> hash, each time the destination file on btrfs has different hash
>>> >> compared to the source file located on ext4 and even hashes from both
>>> >> runs of target files on btrfs differs.
>>> >>
>>> >> I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
>>> >> The snapshot of the result is here http://paste.debian.net/367678/,
>>> >> the is so many bytes with differences. The size of the source and
>>> >> target file is exactly the same.
>>> >>
>>> >>
>>> >> I also copied around 600GB of data set that contains small files,
>>> >> music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
>>> >> - all was fine.
>>> >>
>>> >> Any idea what can cause that issue or how can i debug it in more detail?
>>> >
>>> > The data must have become corrupted before it was written to the btrfs
>>> > volume, since you can read it back without data checksum errors.
>>> >
>>> > Try copying the big file a couple of times but from btrfs to ext4 to
>>> > see if you get data checksum errors.
>>> >
>>> > Run memcheck and long SMART tests on the disks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-25 22:02         ` Henk Slager
@ 2016-01-26  0:15           ` John Smith
       [not found]             ` <CAAcrkYK7p1kFNS_p7s12Qv3Hafemq89hgocfa+DoX6Y15bXeBA@mail.gmail.com>
  2016-01-26 11:54           ` John Smith
  1 sibling, 1 reply; 17+ messages in thread
From: John Smith @ 2016-01-26  0:15 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

Hello,

i copied data from to btrfs (same file) and the hashes are same. Also
btrfs to lvm was okay.

Still waiting for the results of lvm to lvm.

At the moment i dont have any box around that i can use to replace cuboxi.

Could it be really HW problem? As you can see data between btrfs to
btrfs was  copied w/out any issues. Problem is looks like only between
lvm to btrfs. Im so confused.

On Mon, Jan 25, 2016 at 11:02 PM, Henk Slager <eye1tm@gmail.com> wrote:
> On Mon, Jan 25, 2016 at 5:53 PM, John Smith <lenovomi@gmail.com> wrote:
>> Hello all,
>>
>> there is no network involvement in that copy process. Esata enclosure
>> is attached directly to cuboxi and copy is between 2 sata drives in
>> lvm (using ext4) and two btrfs drives in raid1.
>
> I thought it was raid0, anyhow it seems not relevant, it just a btrfs filesystem
>
>> When i copy data from lvm to btrfs, hash of the file on btrfs is
>> different compare to the one on lvm. When I copy exactly same file
>> multiple times all the time it got different hash on btrfs.
>>
>> I did a test, i copied file from btrfs to lvm and both hashes are same.
>>
>>
>> When can intervent / mess up data when i do copy between lvm to btrfs?
>
> My first thought was that it could due to that you are using SATA port
> multiplexing. But then you would likely experience some corruption of
> the btrfs fs, although it might take some time (days/weeks) before you
> discover. You could check dmesg to see what is there for ataX.Y.
> A mitigation would be to do the same copy tests over USB, although
> port-multiplexing might still be effective, I don't know that.
>
> It could also be that the cubox with its current firmware+software
> fails under certain loads (btrfs writing is quite different from
> ext4), it might be that it is just your cubox hardware, maybe power
> issues or whatever. Or some btrfs/other piece of code overwrites the
> rsync/btrfs write buffers before crc every now and then. Or just
> memory errors as already suggested.
>
>>
>> All drives are brand new, badblocks was executed on each drive, also
>> smart doesnt shows up any issues.
>
> The drives are not the issue I think. I would temporary replace the
> cubox with a typical x86_64 system with kernel v4.4 from kernel.org,
> connect the icy box via eSATA port (or USB if you don't have) and
> execute the copy tests lvm -> btrfs. From there you can see if it is
> btrfs on the cubox and maybe then just connect a single SATA disk to
> the cubox and repeat tests and maybe try a bit older kernel (like
> 3.18) as well.
>
> And what if you do If this several times on cubox and PC?
> # dd if=/dev/zero of=<fileonbtrfs> bs=1M count=130000
>
>> thank you
>>
>> On Mon, Jan 25, 2016 at 10:03 AM, Patrik Lundquist
>> <patrik.lundquist@gmail.com> wrote:
>>> On 25 January 2016 at 01:12, John Smith <lenovomi@gmail.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> what else/ or where it was corrupted?
>>>
>>> It got corrupted after leaving the source disk and before btrfs
>>> calculated the data checksum during file write.
>>>
>>>
>>>> I didnt check data checksum
>>>> errors - is it possible with some btrfs tools? But yes I can read
>>>> whole file after stored on btrfs.
>>>
>>> You wouldn't have been able to read the whole corrupted file from
>>> btrfs if the corruption took place after the file was written (due to
>>> wrong checksum).
>>>
>>>
>>>> By checksum errors, do you mean sha256 hash?
>>>
>>> No, I mean the built-in data checksum in btrfs that guarantees file integrity.
>>>
>>>
>>>> I plan to run memcheck but on the irc i was suggested that it is not
>>>> RAM issue, based on the output from the cmp.
>>>
>>> Perhaps not, but you have to rule that out. Leave the memtest overnight.
>>>
>>>
>>>> The drives are brand new
>>>> and badblocks + smart test was executed with no errors.
>>>
>>> That's great.
>>>
>>>
>>>> On Mon, Jan 25, 2016 at 1:06 AM, Patrik Lundquist
>>>> <patrik.lundquist@gmail.com> wrote:
>>>> > On 24 January 2016 at 23:00, John Smith <lenovomi@gmail.com> wrote:
>>>> >>
>>>> >> Dear,
>>>> >>
>>>> >> I have cubox-i4, running debian with 4.4 kernel. The icy box
>>>> >> IB-3664SU3 enclosure is attached into cubox using esata port,
>>>> >> enclosure uses JM393 and JM539 chipsets.
>>>> >>
>>>> >> I use btrfs volume in raid0 created from the two drives, and lvm ext4
>>>> >> volume that contains two drives also. When I copy (using rsync) big
>>>> >> file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
>>>> >> differs.
>>>> >>
>>>> >> I did 2 tests, copy the source file from ext4 to btrfs, count sha256
>>>> >> hash, each time the destination file on btrfs has different hash
>>>> >> compared to the source file located on ext4 and even hashes from both
>>>> >> runs of target files on btrfs differs.
>>>> >>
>>>> >> I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
>>>> >> The snapshot of the result is here http://paste.debian.net/367678/,
>>>> >> the is so many bytes with differences. The size of the source and
>>>> >> target file is exactly the same.
>>>> >>
>>>> >>
>>>> >> I also copied around 600GB of data set that contains small files,
>>>> >> music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
>>>> >> - all was fine.
>>>> >>
>>>> >> Any idea what can cause that issue or how can i debug it in more detail?
>>>> >
>>>> > The data must have become corrupted before it was written to the btrfs
>>>> > volume, since you can read it back without data checksum errors.
>>>> >
>>>> > Try copying the big file a couple of times but from btrfs to ext4 to
>>>> > see if you get data checksum errors.
>>>> >
>>>> > Run memcheck and long SMART tests on the disks.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-25 22:02         ` Henk Slager
  2016-01-26  0:15           ` John Smith
@ 2016-01-26 11:54           ` John Smith
  2016-01-26 12:23             ` Patrik Lundquist
  1 sibling, 1 reply; 17+ messages in thread
From: John Smith @ 2016-01-26 11:54 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

Hi,

is it possible there is connection with
https://bugzilla.kernel.org/show_bug.cgi?id=93581, i have 2x 3tb wd
red drives, kernel 4.4.

I did multiple times sha256sum on the file located on lvm and each
time i got different hash.




On Mon, Jan 25, 2016 at 11:02 PM, Henk Slager <eye1tm@gmail.com> wrote:
> On Mon, Jan 25, 2016 at 5:53 PM, John Smith <lenovomi@gmail.com> wrote:
>> Hello all,
>>
>> there is no network involvement in that copy process. Esata enclosure
>> is attached directly to cuboxi and copy is between 2 sata drives in
>> lvm (using ext4) and two btrfs drives in raid1.
>
> I thought it was raid0, anyhow it seems not relevant, it just a btrfs filesystem
>
>> When i copy data from lvm to btrfs, hash of the file on btrfs is
>> different compare to the one on lvm. When I copy exactly same file
>> multiple times all the time it got different hash on btrfs.
>>
>> I did a test, i copied file from btrfs to lvm and both hashes are same.
>>
>>
>> When can intervent / mess up data when i do copy between lvm to btrfs?
>
> My first thought was that it could due to that you are using SATA port
> multiplexing. But then you would likely experience some corruption of
> the btrfs fs, although it might take some time (days/weeks) before you
> discover. You could check dmesg to see what is there for ataX.Y.
> A mitigation would be to do the same copy tests over USB, although
> port-multiplexing might still be effective, I don't know that.
>
> It could also be that the cubox with its current firmware+software
> fails under certain loads (btrfs writing is quite different from
> ext4), it might be that it is just your cubox hardware, maybe power
> issues or whatever. Or some btrfs/other piece of code overwrites the
> rsync/btrfs write buffers before crc every now and then. Or just
> memory errors as already suggested.
>
>>
>> All drives are brand new, badblocks was executed on each drive, also
>> smart doesnt shows up any issues.
>
> The drives are not the issue I think. I would temporary replace the
> cubox with a typical x86_64 system with kernel v4.4 from kernel.org,
> connect the icy box via eSATA port (or USB if you don't have) and
> execute the copy tests lvm -> btrfs. From there you can see if it is
> btrfs on the cubox and maybe then just connect a single SATA disk to
> the cubox and repeat tests and maybe try a bit older kernel (like
> 3.18) as well.
>
> And what if you do If this several times on cubox and PC?
> # dd if=/dev/zero of=<fileonbtrfs> bs=1M count=130000
>
>> thank you
>>
>> On Mon, Jan 25, 2016 at 10:03 AM, Patrik Lundquist
>> <patrik.lundquist@gmail.com> wrote:
>>> On 25 January 2016 at 01:12, John Smith <lenovomi@gmail.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> what else/ or where it was corrupted?
>>>
>>> It got corrupted after leaving the source disk and before btrfs
>>> calculated the data checksum during file write.
>>>
>>>
>>>> I didnt check data checksum
>>>> errors - is it possible with some btrfs tools? But yes I can read
>>>> whole file after stored on btrfs.
>>>
>>> You wouldn't have been able to read the whole corrupted file from
>>> btrfs if the corruption took place after the file was written (due to
>>> wrong checksum).
>>>
>>>
>>>> By checksum errors, do you mean sha256 hash?
>>>
>>> No, I mean the built-in data checksum in btrfs that guarantees file integrity.
>>>
>>>
>>>> I plan to run memcheck but on the irc i was suggested that it is not
>>>> RAM issue, based on the output from the cmp.
>>>
>>> Perhaps not, but you have to rule that out. Leave the memtest overnight.
>>>
>>>
>>>> The drives are brand new
>>>> and badblocks + smart test was executed with no errors.
>>>
>>> That's great.
>>>
>>>
>>>> On Mon, Jan 25, 2016 at 1:06 AM, Patrik Lundquist
>>>> <patrik.lundquist@gmail.com> wrote:
>>>> > On 24 January 2016 at 23:00, John Smith <lenovomi@gmail.com> wrote:
>>>> >>
>>>> >> Dear,
>>>> >>
>>>> >> I have cubox-i4, running debian with 4.4 kernel. The icy box
>>>> >> IB-3664SU3 enclosure is attached into cubox using esata port,
>>>> >> enclosure uses JM393 and JM539 chipsets.
>>>> >>
>>>> >> I use btrfs volume in raid0 created from the two drives, and lvm ext4
>>>> >> volume that contains two drives also. When I copy (using rsync) big
>>>> >> file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
>>>> >> differs.
>>>> >>
>>>> >> I did 2 tests, copy the source file from ext4 to btrfs, count sha256
>>>> >> hash, each time the destination file on btrfs has different hash
>>>> >> compared to the source file located on ext4 and even hashes from both
>>>> >> runs of target files on btrfs differs.
>>>> >>
>>>> >> I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
>>>> >> The snapshot of the result is here http://paste.debian.net/367678/,
>>>> >> the is so many bytes with differences. The size of the source and
>>>> >> target file is exactly the same.
>>>> >>
>>>> >>
>>>> >> I also copied around 600GB of data set that contains small files,
>>>> >> music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
>>>> >> - all was fine.
>>>> >>
>>>> >> Any idea what can cause that issue or how can i debug it in more detail?
>>>> >
>>>> > The data must have become corrupted before it was written to the btrfs
>>>> > volume, since you can read it back without data checksum errors.
>>>> >
>>>> > Try copying the big file a couple of times but from btrfs to ext4 to
>>>> > see if you get data checksum errors.
>>>> >
>>>> > Run memcheck and long SMART tests on the disks.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
       [not found]             ` <CAAcrkYK7p1kFNS_p7s12Qv3Hafemq89hgocfa+DoX6Y15bXeBA@mail.gmail.com>
@ 2016-01-26 11:58               ` John Smith
  2016-01-26 11:59                 ` John Smith
  0 siblings, 1 reply; 17+ messages in thread
From: John Smith @ 2016-01-26 11:58 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

The results of the cmp are here:
https://drive.google.com/file/d/0B6RZ_9vVuTEcQ3RUSV9hdXU3eUE/view?usp=sharing

On Tue, Jan 26, 2016 at 2:00 AM, John Smith <lenovomi@gmail.com> wrote:
> Also I copied out from LVM  to btrfs almost 2TB of small file and all
> hashes are same.
>
> Here is full output of cmp -l file_on_lvm file_on_btrfs
>
> https://drive.google.com/open?id=0B6RZ_9vVuTEcQ3RUSV9hdXU3eUE
>
>
>
>
> On Tue, Jan 26, 2016 at 1:15 AM, John Smith <lenovomi@gmail.com> wrote:
>> Hello,
>>
>> i copied data from to btrfs (same file) and the hashes are same. Also
>> btrfs to lvm was okay.
>>
>> Still waiting for the results of lvm to lvm.
>>
>> At the moment i dont have any box around that i can use to replace cuboxi.
>>
>> Could it be really HW problem? As you can see data between btrfs to
>> btrfs was  copied w/out any issues. Problem is looks like only between
>> lvm to btrfs. Im so confused.
>>
>> On Mon, Jan 25, 2016 at 11:02 PM, Henk Slager <eye1tm@gmail.com> wrote:
>>> On Mon, Jan 25, 2016 at 5:53 PM, John Smith <lenovomi@gmail.com> wrote:
>>>> Hello all,
>>>>
>>>> there is no network involvement in that copy process. Esata enclosure
>>>> is attached directly to cuboxi and copy is between 2 sata drives in
>>>> lvm (using ext4) and two btrfs drives in raid1.
>>>
>>> I thought it was raid0, anyhow it seems not relevant, it just a btrfs filesystem
>>>
>>>> When i copy data from lvm to btrfs, hash of the file on btrfs is
>>>> different compare to the one on lvm. When I copy exactly same file
>>>> multiple times all the time it got different hash on btrfs.
>>>>
>>>> I did a test, i copied file from btrfs to lvm and both hashes are same.
>>>>
>>>>
>>>> When can intervent / mess up data when i do copy between lvm to btrfs?
>>>
>>> My first thought was that it could due to that you are using SATA port
>>> multiplexing. But then you would likely experience some corruption of
>>> the btrfs fs, although it might take some time (days/weeks) before you
>>> discover. You could check dmesg to see what is there for ataX.Y.
>>> A mitigation would be to do the same copy tests over USB, although
>>> port-multiplexing might still be effective, I don't know that.
>>>
>>> It could also be that the cubox with its current firmware+software
>>> fails under certain loads (btrfs writing is quite different from
>>> ext4), it might be that it is just your cubox hardware, maybe power
>>> issues or whatever. Or some btrfs/other piece of code overwrites the
>>> rsync/btrfs write buffers before crc every now and then. Or just
>>> memory errors as already suggested.
>>>
>>>>
>>>> All drives are brand new, badblocks was executed on each drive, also
>>>> smart doesnt shows up any issues.
>>>
>>> The drives are not the issue I think. I would temporary replace the
>>> cubox with a typical x86_64 system with kernel v4.4 from kernel.org,
>>> connect the icy box via eSATA port (or USB if you don't have) and
>>> execute the copy tests lvm -> btrfs. From there you can see if it is
>>> btrfs on the cubox and maybe then just connect a single SATA disk to
>>> the cubox and repeat tests and maybe try a bit older kernel (like
>>> 3.18) as well.
>>>
>>> And what if you do If this several times on cubox and PC?
>>> # dd if=/dev/zero of=<fileonbtrfs> bs=1M count=130000
>>>
>>>> thank you
>>>>
>>>> On Mon, Jan 25, 2016 at 10:03 AM, Patrik Lundquist
>>>> <patrik.lundquist@gmail.com> wrote:
>>>>> On 25 January 2016 at 01:12, John Smith <lenovomi@gmail.com> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> what else/ or where it was corrupted?
>>>>>
>>>>> It got corrupted after leaving the source disk and before btrfs
>>>>> calculated the data checksum during file write.
>>>>>
>>>>>
>>>>>> I didnt check data checksum
>>>>>> errors - is it possible with some btrfs tools? But yes I can read
>>>>>> whole file after stored on btrfs.
>>>>>
>>>>> You wouldn't have been able to read the whole corrupted file from
>>>>> btrfs if the corruption took place after the file was written (due to
>>>>> wrong checksum).
>>>>>
>>>>>
>>>>>> By checksum errors, do you mean sha256 hash?
>>>>>
>>>>> No, I mean the built-in data checksum in btrfs that guarantees file integrity.
>>>>>
>>>>>
>>>>>> I plan to run memcheck but on the irc i was suggested that it is not
>>>>>> RAM issue, based on the output from the cmp.
>>>>>
>>>>> Perhaps not, but you have to rule that out. Leave the memtest overnight.
>>>>>
>>>>>
>>>>>> The drives are brand new
>>>>>> and badblocks + smart test was executed with no errors.
>>>>>
>>>>> That's great.
>>>>>
>>>>>
>>>>>> On Mon, Jan 25, 2016 at 1:06 AM, Patrik Lundquist
>>>>>> <patrik.lundquist@gmail.com> wrote:
>>>>>> > On 24 January 2016 at 23:00, John Smith <lenovomi@gmail.com> wrote:
>>>>>> >>
>>>>>> >> Dear,
>>>>>> >>
>>>>>> >> I have cubox-i4, running debian with 4.4 kernel. The icy box
>>>>>> >> IB-3664SU3 enclosure is attached into cubox using esata port,
>>>>>> >> enclosure uses JM393 and JM539 chipsets.
>>>>>> >>
>>>>>> >> I use btrfs volume in raid0 created from the two drives, and lvm ext4
>>>>>> >> volume that contains two drives also. When I copy (using rsync) big
>>>>>> >> file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
>>>>>> >> differs.
>>>>>> >>
>>>>>> >> I did 2 tests, copy the source file from ext4 to btrfs, count sha256
>>>>>> >> hash, each time the destination file on btrfs has different hash
>>>>>> >> compared to the source file located on ext4 and even hashes from both
>>>>>> >> runs of target files on btrfs differs.
>>>>>> >>
>>>>>> >> I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
>>>>>> >> The snapshot of the result is here http://paste.debian.net/367678/,
>>>>>> >> the is so many bytes with differences. The size of the source and
>>>>>> >> target file is exactly the same.
>>>>>> >>
>>>>>> >>
>>>>>> >> I also copied around 600GB of data set that contains small files,
>>>>>> >> music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
>>>>>> >> - all was fine.
>>>>>> >>
>>>>>> >> Any idea what can cause that issue or how can i debug it in more detail?
>>>>>> >
>>>>>> > The data must have become corrupted before it was written to the btrfs
>>>>>> > volume, since you can read it back without data checksum errors.
>>>>>> >
>>>>>> > Try copying the big file a couple of times but from btrfs to ext4 to
>>>>>> > see if you get data checksum errors.
>>>>>> >
>>>>>> > Run memcheck and long SMART tests on the disks.
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-26 11:58               ` John Smith
@ 2016-01-26 11:59                 ` John Smith
  0 siblings, 0 replies; 17+ messages in thread
From: John Smith @ 2016-01-26 11:59 UTC (permalink / raw)
  To: Henk Slager; +Cc: linux-btrfs

To previous email:

Also I copied out from LVM  to btrfs almost 2TB of small file and all
hashes are same.

Here is full output of cmp -l file_on_lvm file_on_btrfs

https://drive.google.com/file/d/0B6RZ_9vVuTEcQ3RUSV9hdXU3eUE/view?usp=sharing


On Tue, Jan 26, 2016 at 12:58 PM, John Smith <lenovomi@gmail.com> wrote:
> The results of the cmp are here:
> https://drive.google.com/file/d/0B6RZ_9vVuTEcQ3RUSV9hdXU3eUE/view?usp=sharing
>
> On Tue, Jan 26, 2016 at 2:00 AM, John Smith <lenovomi@gmail.com> wrote:
>> Also I copied out from LVM  to btrfs almost 2TB of small file and all
>> hashes are same.
>>
>> Here is full output of cmp -l file_on_lvm file_on_btrfs
>>
>> https://drive.google.com/open?id=0B6RZ_9vVuTEcQ3RUSV9hdXU3eUE
>>
>>
>>
>>
>> On Tue, Jan 26, 2016 at 1:15 AM, John Smith <lenovomi@gmail.com> wrote:
>>> Hello,
>>>
>>> i copied data from to btrfs (same file) and the hashes are same. Also
>>> btrfs to lvm was okay.
>>>
>>> Still waiting for the results of lvm to lvm.
>>>
>>> At the moment i dont have any box around that i can use to replace cuboxi.
>>>
>>> Could it be really HW problem? As you can see data between btrfs to
>>> btrfs was  copied w/out any issues. Problem is looks like only between
>>> lvm to btrfs. Im so confused.
>>>
>>> On Mon, Jan 25, 2016 at 11:02 PM, Henk Slager <eye1tm@gmail.com> wrote:
>>>> On Mon, Jan 25, 2016 at 5:53 PM, John Smith <lenovomi@gmail.com> wrote:
>>>>> Hello all,
>>>>>
>>>>> there is no network involvement in that copy process. Esata enclosure
>>>>> is attached directly to cuboxi and copy is between 2 sata drives in
>>>>> lvm (using ext4) and two btrfs drives in raid1.
>>>>
>>>> I thought it was raid0, anyhow it seems not relevant, it just a btrfs filesystem
>>>>
>>>>> When i copy data from lvm to btrfs, hash of the file on btrfs is
>>>>> different compare to the one on lvm. When I copy exactly same file
>>>>> multiple times all the time it got different hash on btrfs.
>>>>>
>>>>> I did a test, i copied file from btrfs to lvm and both hashes are same.
>>>>>
>>>>>
>>>>> When can intervent / mess up data when i do copy between lvm to btrfs?
>>>>
>>>> My first thought was that it could due to that you are using SATA port
>>>> multiplexing. But then you would likely experience some corruption of
>>>> the btrfs fs, although it might take some time (days/weeks) before you
>>>> discover. You could check dmesg to see what is there for ataX.Y.
>>>> A mitigation would be to do the same copy tests over USB, although
>>>> port-multiplexing might still be effective, I don't know that.
>>>>
>>>> It could also be that the cubox with its current firmware+software
>>>> fails under certain loads (btrfs writing is quite different from
>>>> ext4), it might be that it is just your cubox hardware, maybe power
>>>> issues or whatever. Or some btrfs/other piece of code overwrites the
>>>> rsync/btrfs write buffers before crc every now and then. Or just
>>>> memory errors as already suggested.
>>>>
>>>>>
>>>>> All drives are brand new, badblocks was executed on each drive, also
>>>>> smart doesnt shows up any issues.
>>>>
>>>> The drives are not the issue I think. I would temporary replace the
>>>> cubox with a typical x86_64 system with kernel v4.4 from kernel.org,
>>>> connect the icy box via eSATA port (or USB if you don't have) and
>>>> execute the copy tests lvm -> btrfs. From there you can see if it is
>>>> btrfs on the cubox and maybe then just connect a single SATA disk to
>>>> the cubox and repeat tests and maybe try a bit older kernel (like
>>>> 3.18) as well.
>>>>
>>>> And what if you do If this several times on cubox and PC?
>>>> # dd if=/dev/zero of=<fileonbtrfs> bs=1M count=130000
>>>>
>>>>> thank you
>>>>>
>>>>> On Mon, Jan 25, 2016 at 10:03 AM, Patrik Lundquist
>>>>> <patrik.lundquist@gmail.com> wrote:
>>>>>> On 25 January 2016 at 01:12, John Smith <lenovomi@gmail.com> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> what else/ or where it was corrupted?
>>>>>>
>>>>>> It got corrupted after leaving the source disk and before btrfs
>>>>>> calculated the data checksum during file write.
>>>>>>
>>>>>>
>>>>>>> I didnt check data checksum
>>>>>>> errors - is it possible with some btrfs tools? But yes I can read
>>>>>>> whole file after stored on btrfs.
>>>>>>
>>>>>> You wouldn't have been able to read the whole corrupted file from
>>>>>> btrfs if the corruption took place after the file was written (due to
>>>>>> wrong checksum).
>>>>>>
>>>>>>
>>>>>>> By checksum errors, do you mean sha256 hash?
>>>>>>
>>>>>> No, I mean the built-in data checksum in btrfs that guarantees file integrity.
>>>>>>
>>>>>>
>>>>>>> I plan to run memcheck but on the irc i was suggested that it is not
>>>>>>> RAM issue, based on the output from the cmp.
>>>>>>
>>>>>> Perhaps not, but you have to rule that out. Leave the memtest overnight.
>>>>>>
>>>>>>
>>>>>>> The drives are brand new
>>>>>>> and badblocks + smart test was executed with no errors.
>>>>>>
>>>>>> That's great.
>>>>>>
>>>>>>
>>>>>>> On Mon, Jan 25, 2016 at 1:06 AM, Patrik Lundquist
>>>>>>> <patrik.lundquist@gmail.com> wrote:
>>>>>>> > On 24 January 2016 at 23:00, John Smith <lenovomi@gmail.com> wrote:
>>>>>>> >>
>>>>>>> >> Dear,
>>>>>>> >>
>>>>>>> >> I have cubox-i4, running debian with 4.4 kernel. The icy box
>>>>>>> >> IB-3664SU3 enclosure is attached into cubox using esata port,
>>>>>>> >> enclosure uses JM393 and JM539 chipsets.
>>>>>>> >>
>>>>>>> >> I use btrfs volume in raid0 created from the two drives, and lvm ext4
>>>>>>> >> volume that contains two drives also. When I copy (using rsync) big
>>>>>>> >> file (the one i copied is 130GB) from ext4 to btrfs the sha256 hash is
>>>>>>> >> differs.
>>>>>>> >>
>>>>>>> >> I did 2 tests, copy the source file from ext4 to btrfs, count sha256
>>>>>>> >> hash, each time the destination file on btrfs has different hash
>>>>>>> >> compared to the source file located on ext4 and even hashes from both
>>>>>>> >> runs of target files on btrfs differs.
>>>>>>> >>
>>>>>>> >> I run cmp -l <(hexdump source_file_ext4) <(hexdump target_file_btrfs).
>>>>>>> >> The snapshot of the result is here http://paste.debian.net/367678/,
>>>>>>> >> the is so many bytes with differences. The size of the source and
>>>>>>> >> target file is exactly the same.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> I also copied around 600GB of data set that contains small files,
>>>>>>> >> music, videos, etc... and i did sha256 on all the files ext4 vs btrfs
>>>>>>> >> - all was fine.
>>>>>>> >>
>>>>>>> >> Any idea what can cause that issue or how can i debug it in more detail?
>>>>>>> >
>>>>>>> > The data must have become corrupted before it was written to the btrfs
>>>>>>> > volume, since you can read it back without data checksum errors.
>>>>>>> >
>>>>>>> > Try copying the big file a couple of times but from btrfs to ext4 to
>>>>>>> > see if you get data checksum errors.
>>>>>>> >
>>>>>>> > Run memcheck and long SMART tests on the disks.
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-26 11:54           ` John Smith
@ 2016-01-26 12:23             ` Patrik Lundquist
  2016-01-26 12:27               ` John Smith
  2016-01-26 14:32               ` John Smith
  0 siblings, 2 replies; 17+ messages in thread
From: Patrik Lundquist @ 2016-01-26 12:23 UTC (permalink / raw)
  To: John Smith; +Cc: linux-btrfs

On 26 January 2016 at 12:54, John Smith <lenovomi@gmail.com> wrote:
>
> is it possible there is connection with
> https://bugzilla.kernel.org/show_bug.cgi?id=93581, i have 2x 3tb wd
> red drives, kernel 4.4.

No, WD Red aren't Shingled Magnetic Recording (SMR) drives.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-26 12:23             ` Patrik Lundquist
@ 2016-01-26 12:27               ` John Smith
  2016-01-26 14:32               ` John Smith
  1 sibling, 0 replies; 17+ messages in thread
From: John Smith @ 2016-01-26 12:27 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs

but i heard that was serious issue with LVM/MD + ext4 in 3-4x kernels?

On Tue, Jan 26, 2016 at 1:23 PM, Patrik Lundquist
<patrik.lundquist@gmail.com> wrote:
> On 26 January 2016 at 12:54, John Smith <lenovomi@gmail.com> wrote:
>>
>> is it possible there is connection with
>> https://bugzilla.kernel.org/show_bug.cgi?id=93581, i have 2x 3tb wd
>> red drives, kernel 4.4.
>
> No, WD Red aren't Shingled Magnetic Recording (SMR) drives.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-26 12:23             ` Patrik Lundquist
  2016-01-26 12:27               ` John Smith
@ 2016-01-26 14:32               ` John Smith
  2016-01-26 16:51                 ` Duncan
  1 sibling, 1 reply; 17+ messages in thread
From: John Smith @ 2016-01-26 14:32 UTC (permalink / raw)
  To: Patrik Lundquist; +Cc: linux-btrfs

Ok,

i finished with testing and corruption (different hashes) happens only
from lvm -> to btrfs, lvm -> lvm.

Also i count sha256 on the same file on lvm 3x and i got 3 different hashes.

Is it still hw issue or bug in LVM/ext4?




On Tue, Jan 26, 2016 at 1:23 PM, Patrik Lundquist
<patrik.lundquist@gmail.com> wrote:
> On 26 January 2016 at 12:54, John Smith <lenovomi@gmail.com> wrote:
>>
>> is it possible there is connection with
>> https://bugzilla.kernel.org/show_bug.cgi?id=93581, i have 2x 3tb wd
>> red drives, kernel 4.4.
>
> No, WD Red aren't Shingled Magnetic Recording (SMR) drives.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-26 14:32               ` John Smith
@ 2016-01-26 16:51                 ` Duncan
  2016-01-26 17:41                   ` John Smith
  0 siblings, 1 reply; 17+ messages in thread
From: Duncan @ 2016-01-26 16:51 UTC (permalink / raw)
  To: linux-btrfs

John Smith posted on Tue, 26 Jan 2016 15:32:21 +0100 as excerpted:

> i finished with testing and corruption (different hashes) happens only
> from lvm -> to btrfs, lvm -> lvm.
> 
> Also i count sha256 on the same file on lvm 3x and i got 3 different
> hashes.
> 
> Is it still hw issue or bug in LVM/ext4?

That does seem to clear btrfs, but it could be either ext4 or lvm, or 
under the lvm at the physical volume level, and hardware is still a 
possibility as well, tho rather less likely, as btrfs likely stresses the 
hardware more than lvm or ext4.

The next question is whether it's the ext4 or lvm layer.  There are two 
ways to test it that I can think of.

One would be to run an sha256 hash test directly on the logical volume 
the filesystem is normally created on (with the lvm assembled in read-
only mode and/or without the filesystem on top of it mounted or mounted 
read-only).  Does that return the same hash when run multiple times?

Obviously the problem there is the size of the logical volume; getting a 
hash of the entire raw volume is likely to take some time.  But this 
should be the best test as it eliminates the filesystem from the equation 
entirely.

Another alternative would be trying some filesystem other than ext4 on 
the logical volume, say btrfs, xfs or reiserfs.

Either way, if the errors remain without ext4 being in the picture, 
either because you're hashing the raw device or because you're using some 
other filesystem, then that pretty well clears ext4.  If the errors go 
away, then heading to the ext4 list would probably be best, as ext4 would 
seem to be the culprit.

If there's still errors at the logical volume device level, then the next 
question is whether they appear on the raw physical volumes that the 
logical volume is assembled out of, or not.  Again, I'd suggest hashing 
the raw devices repeatedly and see if they return the same hashes each 
time.  If not, then it's likely either the hardware or the device driver, 
and you'd contact the device driver maintainer.  (Here, I'd probably file 
a bug on kernel bugzilla, YMMV.)  If the raw physical devices hash 
consistently while the LVM logical volume doesn't, then it's time to 
contact the LVM folks.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-26 16:51                 ` Duncan
@ 2016-01-26 17:41                   ` John Smith
  2016-01-27  8:00                     ` Duncan
  0 siblings, 1 reply; 17+ messages in thread
From: John Smith @ 2016-01-26 17:41 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

Hello,
not sure if i have proper skill to do that LVM/ext4 test. I found this
- https://www.phoronix.com/scan.php?page=news_item&px=Linux-4-EXT4-RAID-Issue-Found

Maybe there is some connection?


On Tue, Jan 26, 2016 at 5:51 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> John Smith posted on Tue, 26 Jan 2016 15:32:21 +0100 as excerpted:
>
>> i finished with testing and corruption (different hashes) happens only
>> from lvm -> to btrfs, lvm -> lvm.
>>
>> Also i count sha256 on the same file on lvm 3x and i got 3 different
>> hashes.
>>
>> Is it still hw issue or bug in LVM/ext4?
>
> That does seem to clear btrfs, but it could be either ext4 or lvm, or
> under the lvm at the physical volume level, and hardware is still a
> possibility as well, tho rather less likely, as btrfs likely stresses the
> hardware more than lvm or ext4.
>
> The next question is whether it's the ext4 or lvm layer.  There are two
> ways to test it that I can think of.
>
> One would be to run an sha256 hash test directly on the logical volume
> the filesystem is normally created on (with the lvm assembled in read-
> only mode and/or without the filesystem on top of it mounted or mounted
> read-only).  Does that return the same hash when run multiple times?
>
> Obviously the problem there is the size of the logical volume; getting a
> hash of the entire raw volume is likely to take some time.  But this
> should be the best test as it eliminates the filesystem from the equation
> entirely.
>
> Another alternative would be trying some filesystem other than ext4 on
> the logical volume, say btrfs, xfs or reiserfs.
>
> Either way, if the errors remain without ext4 being in the picture,
> either because you're hashing the raw device or because you're using some
> other filesystem, then that pretty well clears ext4.  If the errors go
> away, then heading to the ext4 list would probably be best, as ext4 would
> seem to be the culprit.
>
> If there's still errors at the logical volume device level, then the next
> question is whether they appear on the raw physical volumes that the
> logical volume is assembled out of, or not.  Again, I'd suggest hashing
> the raw devices repeatedly and see if they return the same hashes each
> time.  If not, then it's likely either the hardware or the device driver,
> and you'd contact the device driver maintainer.  (Here, I'd probably file
> a bug on kernel bugzilla, YMMV.)  If the raw physical devices hash
> consistently while the LVM logical volume doesn't, then it's time to
> contact the LVM folks.
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: uknown issues - different sha256 hash - files corruption
  2016-01-26 17:41                   ` John Smith
@ 2016-01-27  8:00                     ` Duncan
  0 siblings, 0 replies; 17+ messages in thread
From: Duncan @ 2016-01-27  8:00 UTC (permalink / raw)
  To: linux-btrfs

John Smith posted on Tue, 26 Jan 2016 18:41:34 +0100 as excerpted:

> not sure if i have proper skill to do that LVM/ext4 test. I found this -
> https://www.phoronix.com/scan.php?page=news_item&px=Linux-4-EXT4-RAID-
Issue-Found
> 
> Maybe there is some connection?

I doubt it as you're doing lvm, not mdraid.  Plus that should be fixed on 
at least current kernels.

As for how to do the test, just sha256sum the devices themselves (/dev/
whatever, obviously as root, obviously in binary mode), instead of the 
files on top of the filesystem.  Make sure the filesystem either isn't 
mounted or is mounted read-only while taking the hash of its device, so 
the content doesn't change mid-hash or between runs.

The devices are obviously going to be quite large, terabyte scale I 
guess, compared to 100-200 GiB files, so it'll probably take some time to 
get a full hash.

First you'd want to hash the lvm logical volume device, the one that the 
filesystem is on.  Hash it two or three times and see if it's the same 
each time, or different.  If it's different, then you know that the 
problem is below that level.  If it's the same, then the problem is above 
that level, probably in ext4.

If the lvm logical volume hashes come out different, then try the same 
thing on the raw physical devices, /dev/sda or whatever, that make up the 
logical volume.  If the lvm logical device hashes come out different but 
the component physical device hashes come out the same, the problem's in 
the lvm level.  If the physical device hashes come out different also, 
then the problem is below lvm, in either the device drivers, or the 
hardware/firmware itself.

I don't run lvm here so I can't give you a specific example of that, and 
I have my physical devices partitioned, so the example below uses one of 
them, not the unpartitioned device (which I could do but some of the 
partitions are mounted so I don't want to try it), but here's what I just 
ran here, for a device partition that contains a filesystem I know isn't 
mounted at the moment.  Again, as root, thus the # prompt indicating root:

# sha256sum -b /dev/sda7
34169009f1dbdf93bd60e2f466b10c98323de695d5d10eb42cab7879b08a0adf
 */dev/sda7
# sha256sum -b /dev/sda7
34169009f1dbdf93bd60e2f466b10c98323de695d5d10eb42cab7879b08a0adf
 */dev/sda7
# 

As you can see, in my case, the two hashes came out the same.  I expected 
that of course, and would have been getting real worried right about now 
if they hadn't.

As it happens I have reasonably fast SSDs, and that partition was only 24 
GiB in size, so it didn't take a horribly long time.  Tho it still took 
some time.  I actually reran it with # time sha256sum ... to see exactly 
how long it took, and time said it took 2 minutes, 59.637 seconds, real 
wall time, 2:50 user time and 9.2 seconds system time, so basically 100% 
of one core doing the sha256 sum calculation the whole time, probably 
single-core CPU bound, not SSD thruput bound.  That's 8 GiB/min or about 
136 MiB/sec, thruput (yeah, definitely single-core CPU bound as the SSD's 
rated 600-ish MiB/sec, basically topping out the SATA3 600 bandwidth), on 
my amd fx6100 (6-core bulldozer1) CPU, slightly upclocked to 3.6 GHz.  
With a similar performing CPU, a 1 TB device should take about two hours 
to hash.

I guess it'll probably take rather longer on that cubox cpu, and that 
it'll be cpu-bound even on spinning rust, there.  You have the size of 
the files you tested, so if you have the time it took and the size of the 
devices, I guess you can do the math to figure out about how long the 
full device test is likely to take, but we're definitely looking at 
hours...

You could try sha1sum instead of sha256sum.  It should be faster, and 
we're just taking hashes here so cryptographic strength isn't an issue.  
An sha1 run on the same partition as above timed to 1 minute 51 seconds, 
real wall time, here, so much faster.

md5sum should be even faster, 1 minute 30 seconds on that partition, 
twice the speed of sha256sum, here.  Obviously it's no longer trusted for 
cryptographic checksums, but for quick corruptions checks, as here, it 
should be fine.  Depending on the cpu and speed of the devices, 
particularly on spinning rust md5sum may well bottleneck on the speed of 
the disks rather than the speed of the cpu.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-01-27  8:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-24 22:00 uknown issues - different sha256 hash - files corruption John Smith
2016-01-25  0:06 ` Patrik Lundquist
     [not found]   ` <CAAcrkY+3w--OGYWwben+KYohdqwBBryDn8REJ6tiBk4jM3Tp9w@mail.gmail.com>
2016-01-25  9:03     ` Patrik Lundquist
2016-01-25 16:53       ` John Smith
2016-01-25 22:02         ` Henk Slager
2016-01-26  0:15           ` John Smith
     [not found]             ` <CAAcrkYK7p1kFNS_p7s12Qv3Hafemq89hgocfa+DoX6Y15bXeBA@mail.gmail.com>
2016-01-26 11:58               ` John Smith
2016-01-26 11:59                 ` John Smith
2016-01-26 11:54           ` John Smith
2016-01-26 12:23             ` Patrik Lundquist
2016-01-26 12:27               ` John Smith
2016-01-26 14:32               ` John Smith
2016-01-26 16:51                 ` Duncan
2016-01-26 17:41                   ` John Smith
2016-01-27  8:00                     ` Duncan
2016-01-25  0:28 ` Duncan
2016-01-25  0:48   ` Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.