All of lore.kernel.org
 help / color / mirror / Atom feed
* bad CRC in data error on ARM
@ 2015-05-14 23:51 huang jun
  2015-05-15  8:36 ` Steve Capper
  0 siblings, 1 reply; 12+ messages in thread
From: huang jun @ 2015-05-14 23:51 UTC (permalink / raw)
  To: ceph-devel

hi,all

We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
to write data.  On the osd side, we got bad data CRC error.

The kclient log: (tid=6)
May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
data size is 4194304
May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994

The OSD-0 log:
2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
mid_len 0 data_len 4194304
2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994

some considerations:
1) we use ceph 0.80.7 realse version and compile it on ARM, did this
works? or  does ceph's code has ARM branch?

2) as we have write 125 objects, only few of them report CRC error,
and the right object's data_crc is 0 both on osd and kclient. the
wrong object's data_crc is not 0 on kclient, but osd calculate result
0. the object data came from /dev/zero, i think the data_crc should be
0, am i right?

-- 
thanks
huangjun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-14 23:51 bad CRC in data error on ARM huang jun
@ 2015-05-15  8:36 ` Steve Capper
  2015-05-16  2:10   ` huang jun
  0 siblings, 1 reply; 12+ messages in thread
From: Steve Capper @ 2015-05-15  8:36 UTC (permalink / raw)
  To: huang jun; +Cc: ceph-devel, Yazen Ghannam

On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
> hi,all

Hi HuangJun,

>
> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
> to write data.  On the osd side, we got bad data CRC error.
>
> The kclient log: (tid=6)
> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
> data size is 4194304
> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>
> The OSD-0 log:
> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
> mid_len 0 data_len 4194304
> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>
> some considerations:
> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
> works? or  does ceph's code has ARM branch?

We did run a Ceph version close to that for 64-bit ARM, I'm checking
out 0.80.7 now to test.
In v9.0.0, there is some code to use the ARM optional crc32c
instructions, but this isn't in 0.80.7.

>
> 2) as we have write 125 objects, only few of them report CRC error,
> and the right object's data_crc is 0 both on osd and kclient. the
> wrong object's data_crc is not 0 on kclient, but osd calculate result
> 0. the object data came from /dev/zero, i think the data_crc should be
> 0, am i right?
>

If the initial CRC seed value is non-zero, then the CRC of a buffer
full of zeros won't be zero.
So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.

I would like to reproduce this problem here.
What steps did you take before this error occurred?
Is this a cephfs filesystem or something on top of an RBD image?
Which kernel are you running? Is it the one that comes with Ubuntu?
(If so which package version is it?)

Cheers,
--
Steve

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-15  8:36 ` Steve Capper
@ 2015-05-16  2:10   ` huang jun
  2015-05-16  9:30     ` Haomai Wang
  0 siblings, 1 reply; 12+ messages in thread
From: huang jun @ 2015-05-16  2:10 UTC (permalink / raw)
  To: Steve Capper; +Cc: ceph-devel, Yazen Ghannam

hi,steve

2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>> hi,all
>
> Hi HuangJun,
>
>>
>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>> to write data.  On the osd side, we got bad data CRC error.
>>
>> The kclient log: (tid=6)
>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>> data size is 4194304
>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>
>> The OSD-0 log:
>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>> mid_len 0 data_len 4194304
>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>
>> some considerations:
>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>> works? or  does ceph's code has ARM branch?
>
> We did run a Ceph version close to that for 64-bit ARM, I'm checking
> out 0.80.7 now to test.
> In v9.0.0, there is some code to use the ARM optional crc32c
> instructions, but this isn't in 0.80.7.
>
>>
>> 2) as we have write 125 objects, only few of them report CRC error,
>> and the right object's data_crc is 0 both on osd and kclient. the
>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>> 0. the object data came from /dev/zero, i think the data_crc should be
>> 0, am i right?
>>
>
> If the initial CRC seed value is non-zero, then the CRC of a buffer
> full of zeros won't be zero.
> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>
> I would like to reproduce this problem here.
> What steps did you take before this error occurred?
> Is this a cephfs filesystem or something on top of an RBD image?
> Which kernel are you running? Is it the one that comes with Ubuntu?
> (If so which package version is it?)
>
We use linux kernel version 3.14 and we just tested it on Ubuntu, and
ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
I'm not sure whether it's related to Memory, since we tested many
times, but just a few reported CRC error.
As i mentioned, i doubt the memory fault changed the data, because we
write 125 objects, and the all data_crc is 0 except the Bad CRC
object's data_crc. Any tips are welcome.

> Cheers,
> --
> Steve



-- 
thanks
huangjun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-16  2:10   ` huang jun
@ 2015-05-16  9:30     ` Haomai Wang
  2015-05-16 10:21       ` huang jun
  0 siblings, 1 reply; 12+ messages in thread
From: Haomai Wang @ 2015-05-16  9:30 UTC (permalink / raw)
  To: huang jun; +Cc: Steve Capper, ceph-devel, Yazen Ghannam

is this always happen or occasionally?

On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@gmail.com> wrote:
> hi,steve
>
> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>>> hi,all
>>
>> Hi HuangJun,
>>
>>>
>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>> to write data.  On the osd side, we got bad data CRC error.
>>>
>>> The kclient log: (tid=6)
>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>> data size is 4194304
>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>
>>> The OSD-0 log:
>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>> mid_len 0 data_len 4194304
>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>
>>> some considerations:
>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>> works? or  does ceph's code has ARM branch?
>>
>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>> out 0.80.7 now to test.
>> In v9.0.0, there is some code to use the ARM optional crc32c
>> instructions, but this isn't in 0.80.7.
>>
>>>
>>> 2) as we have write 125 objects, only few of them report CRC error,
>>> and the right object's data_crc is 0 both on osd and kclient. the
>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>> 0, am i right?
>>>
>>
>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>> full of zeros won't be zero.
>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>
>> I would like to reproduce this problem here.
>> What steps did you take before this error occurred?
>> Is this a cephfs filesystem or something on top of an RBD image?
>> Which kernel are you running? Is it the one that comes with Ubuntu?
>> (If so which package version is it?)
>>
> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
> I'm not sure whether it's related to Memory, since we tested many
> times, but just a few reported CRC error.
> As i mentioned, i doubt the memory fault changed the data, because we
> write 125 objects, and the all data_crc is 0 except the Bad CRC
> object's data_crc. Any tips are welcome.
>
>> Cheers,
>> --
>> Steve
>
>
>
> --
> thanks
> huangjun
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-16  9:30     ` Haomai Wang
@ 2015-05-16 10:21       ` huang jun
  2015-05-16 10:54         ` huang jun
  0 siblings, 1 reply; 12+ messages in thread
From: huang jun @ 2015-05-16 10:21 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Steve Capper, ceph-devel, Yazen Ghannam

that always happen, every test have such errors. And our cluster and
client that  running on X86 works fine, never seen bad crc error.


2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
> is this always happen or occasionally?
>
> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@gmail.com> wrote:
>> hi,steve
>>
>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>>>> hi,all
>>>
>>> Hi HuangJun,
>>>
>>>>
>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>>> to write data.  On the osd side, we got bad data CRC error.
>>>>
>>>> The kclient log: (tid=6)
>>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>>> data size is 4194304
>>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>>
>>>> The OSD-0 log:
>>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>>> mid_len 0 data_len 4194304
>>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>>
>>>> some considerations:
>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>>> works? or  does ceph's code has ARM branch?
>>>
>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>>> out 0.80.7 now to test.
>>> In v9.0.0, there is some code to use the ARM optional crc32c
>>> instructions, but this isn't in 0.80.7.
>>>
>>>>
>>>> 2) as we have write 125 objects, only few of them report CRC error,
>>>> and the right object's data_crc is 0 both on osd and kclient. the
>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>>> 0, am i right?
>>>>
>>>
>>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>>> full of zeros won't be zero.
>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>>
>>> I would like to reproduce this problem here.
>>> What steps did you take before this error occurred?
>>> Is this a cephfs filesystem or something on top of an RBD image?
>>> Which kernel are you running? Is it the one that comes with Ubuntu?
>>> (If so which package version is it?)
>>>
>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
>> I'm not sure whether it's related to Memory, since we tested many
>> times, but just a few reported CRC error.
>> As i mentioned, i doubt the memory fault changed the data, because we
>> write 125 objects, and the all data_crc is 0 except the Bad CRC
>> object's data_crc. Any tips are welcome.
>>
>>> Cheers,
>>> --
>>> Steve
>>
>>
>>
>> --
>> thanks
>> huangjun
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
thanks
huangjun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-16 10:21       ` huang jun
@ 2015-05-16 10:54         ` huang jun
  2015-05-16 11:25           ` Haomai Wang
  0 siblings, 1 reply; 12+ messages in thread
From: huang jun @ 2015-05-16 10:54 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Steve Capper, ceph-devel, Yazen Ghannam

<<Even if from /dev/zero, the data crc shouldn't be 0.
we print all 4M object's data crc, it seems all 0 until now.
<<I guess osd(arm) doesn't do crc computing. But from code, crc for arm
<<should be fine
When decode a message, it will check the fron_crc, middle_crc and also data_crc,
so not OSD but MON and MDS will do crc computing, and the OSD side
compute the CRC value is 0, which is different with the data_crc in
message footer.data_crc.

2015-05-16 18:21 GMT+08:00 huang jun <hjwsm1989@gmail.com>:
> that always happen, every test have such errors. And our cluster and
> client that  running on X86 works fine, never seen bad crc error.
>
>
> 2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>> is this always happen or occasionally?
>>
>> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@gmail.com> wrote:
>>> hi,steve
>>>
>>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
>>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>>>>> hi,all
>>>>
>>>> Hi HuangJun,
>>>>
>>>>>
>>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>>>> to write data.  On the osd side, we got bad data CRC error.
>>>>>
>>>>> The kclient log: (tid=6)
>>>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>>>> data size is 4194304
>>>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>>>
>>>>> The OSD-0 log:
>>>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>>>> mid_len 0 data_len 4194304
>>>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>>>
>>>>> some considerations:
>>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>>>> works? or  does ceph's code has ARM branch?
>>>>
>>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>>>> out 0.80.7 now to test.
>>>> In v9.0.0, there is some code to use the ARM optional crc32c
>>>> instructions, but this isn't in 0.80.7.
>>>>
>>>>>
>>>>> 2) as we have write 125 objects, only few of them report CRC error,
>>>>> and the right object's data_crc is 0 both on osd and kclient. the
>>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>>>> 0, am i right?
>>>>>
>>>>
>>>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>>>> full of zeros won't be zero.
>>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>>>
>>>> I would like to reproduce this problem here.
>>>> What steps did you take before this error occurred?
>>>> Is this a cephfs filesystem or something on top of an RBD image?
>>>> Which kernel are you running? Is it the one that comes with Ubuntu?
>>>> (If so which package version is it?)
>>>>
>>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
>>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
>>> I'm not sure whether it's related to Memory, since we tested many
>>> times, but just a few reported CRC error.
>>> As i mentioned, i doubt the memory fault changed the data, because we
>>> write 125 objects, and the all data_crc is 0 except the Bad CRC
>>> object's data_crc. Any tips are welcome.
>>>
>>>> Cheers,
>>>> --
>>>> Steve
>>>
>>>
>>>
>>> --
>>> thanks
>>> huangjun
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> thanks
> huangjun



-- 
thanks
huangjun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-16 10:54         ` huang jun
@ 2015-05-16 11:25           ` Haomai Wang
  2015-05-16 11:34             ` huang jun
  0 siblings, 1 reply; 12+ messages in thread
From: Haomai Wang @ 2015-05-16 11:25 UTC (permalink / raw)
  To: huang jun; +Cc: Steve Capper, ceph-devel, Yazen Ghannam

On Sat, May 16, 2015 at 6:54 PM, huang jun <hjwsm1989@gmail.com> wrote:
> <<Even if from /dev/zero, the data crc shouldn't be 0.
> we print all 4M object's data crc, it seems all 0 until now.
> <<I guess osd(arm) doesn't do crc computing. But from code, crc for arm
> <<should be fine
> When decode a message, it will check the fron_crc, middle_crc and also data_crc,
> so not OSD but MON and MDS will do crc computing, and the OSD side
> compute the CRC value is 0, which is different with the data_crc in
> message footer.data_crc.

I'm not following your meaning. The core problem is osd computes a
wrong crc value?

>
> 2015-05-16 18:21 GMT+08:00 huang jun <hjwsm1989@gmail.com>:
>> that always happen, every test have such errors. And our cluster and
>> client that  running on X86 works fine, never seen bad crc error.
>>
>>
>> 2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>> is this always happen or occasionally?
>>>
>>> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@gmail.com> wrote:
>>>> hi,steve
>>>>
>>>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
>>>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>> hi,all
>>>>>
>>>>> Hi HuangJun,
>>>>>
>>>>>>
>>>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>>>>> to write data.  On the osd side, we got bad data CRC error.
>>>>>>
>>>>>> The kclient log: (tid=6)
>>>>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>>>>> data size is 4194304
>>>>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>>>>
>>>>>> The OSD-0 log:
>>>>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>>>>> mid_len 0 data_len 4194304
>>>>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>>>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>>>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>>>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>>>>
>>>>>> some considerations:
>>>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>>>>> works? or  does ceph's code has ARM branch?
>>>>>
>>>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>>>>> out 0.80.7 now to test.
>>>>> In v9.0.0, there is some code to use the ARM optional crc32c
>>>>> instructions, but this isn't in 0.80.7.
>>>>>
>>>>>>
>>>>>> 2) as we have write 125 objects, only few of them report CRC error,
>>>>>> and the right object's data_crc is 0 both on osd and kclient. the
>>>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>>>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>>>>> 0, am i right?
>>>>>>
>>>>>
>>>>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>>>>> full of zeros won't be zero.
>>>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>>>>
>>>>> I would like to reproduce this problem here.
>>>>> What steps did you take before this error occurred?
>>>>> Is this a cephfs filesystem or something on top of an RBD image?
>>>>> Which kernel are you running? Is it the one that comes with Ubuntu?
>>>>> (If so which package version is it?)
>>>>>
>>>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
>>>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
>>>> I'm not sure whether it's related to Memory, since we tested many
>>>> times, but just a few reported CRC error.
>>>> As i mentioned, i doubt the memory fault changed the data, because we
>>>> write 125 objects, and the all data_crc is 0 except the Bad CRC
>>>> object's data_crc. Any tips are welcome.
>>>>
>>>>> Cheers,
>>>>> --
>>>>> Steve
>>>>
>>>>
>>>>
>>>> --
>>>> thanks
>>>> huangjun
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>>
>> --
>> thanks
>> huangjun
>
>
>
> --
> thanks
> huangjun



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-16 11:25           ` Haomai Wang
@ 2015-05-16 11:34             ` huang jun
  2015-05-16 14:07               ` Haomai Wang
  0 siblings, 1 reply; 12+ messages in thread
From: huang jun @ 2015-05-16 11:34 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Steve Capper, ceph-devel, Yazen Ghannam

we think is the client send the wrong crc value.
we print data on OSD side, it seems ok, no data changed, but the
calculated crc not equals the one passed from client.


2015-05-16 19:25 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
> On Sat, May 16, 2015 at 6:54 PM, huang jun <hjwsm1989@gmail.com> wrote:
>> <<Even if from /dev/zero, the data crc shouldn't be 0.
>> we print all 4M object's data crc, it seems all 0 until now.
>> <<I guess osd(arm) doesn't do crc computing. But from code, crc for arm
>> <<should be fine
>> When decode a message, it will check the fron_crc, middle_crc and also data_crc,
>> so not OSD but MON and MDS will do crc computing, and the OSD side
>> compute the CRC value is 0, which is different with the data_crc in
>> message footer.data_crc.
>
> I'm not following your meaning. The core problem is osd computes a
> wrong crc value?
>
>>
>> 2015-05-16 18:21 GMT+08:00 huang jun <hjwsm1989@gmail.com>:
>>> that always happen, every test have such errors. And our cluster and
>>> client that  running on X86 works fine, never seen bad crc error.
>>>
>>>
>>> 2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>>> is this always happen or occasionally?
>>>>
>>>> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@gmail.com> wrote:
>>>>> hi,steve
>>>>>
>>>>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
>>>>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>>> hi,all
>>>>>>
>>>>>> Hi HuangJun,
>>>>>>
>>>>>>>
>>>>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>>>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>>>>>> to write data.  On the osd side, we got bad data CRC error.
>>>>>>>
>>>>>>> The kclient log: (tid=6)
>>>>>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>>>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>>>>>> data size is 4194304
>>>>>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>>>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>>>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>>>>>
>>>>>>> The OSD-0 log:
>>>>>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>>>>>> mid_len 0 data_len 4194304
>>>>>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>>>>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>>>>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>>>>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>>>>>
>>>>>>> some considerations:
>>>>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>>>>>> works? or  does ceph's code has ARM branch?
>>>>>>
>>>>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>>>>>> out 0.80.7 now to test.
>>>>>> In v9.0.0, there is some code to use the ARM optional crc32c
>>>>>> instructions, but this isn't in 0.80.7.
>>>>>>
>>>>>>>
>>>>>>> 2) as we have write 125 objects, only few of them report CRC error,
>>>>>>> and the right object's data_crc is 0 both on osd and kclient. the
>>>>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>>>>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>>>>>> 0, am i right?
>>>>>>>
>>>>>>
>>>>>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>>>>>> full of zeros won't be zero.
>>>>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>>>>>
>>>>>> I would like to reproduce this problem here.
>>>>>> What steps did you take before this error occurred?
>>>>>> Is this a cephfs filesystem or something on top of an RBD image?
>>>>>> Which kernel are you running? Is it the one that comes with Ubuntu?
>>>>>> (If so which package version is it?)
>>>>>>
>>>>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
>>>>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
>>>>> I'm not sure whether it's related to Memory, since we tested many
>>>>> times, but just a few reported CRC error.
>>>>> As i mentioned, i doubt the memory fault changed the data, because we
>>>>> write 125 objects, and the all data_crc is 0 except the Bad CRC
>>>>> object's data_crc. Any tips are welcome.
>>>>>
>>>>>> Cheers,
>>>>>> --
>>>>>> Steve
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> thanks
>>>>> huangjun
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>>
>>>> Wheat
>>>
>>>
>>>
>>> --
>>> thanks
>>> huangjun
>>
>>
>>
>> --
>> thanks
>> huangjun
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
thanks
huangjun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-16 11:34             ` huang jun
@ 2015-05-16 14:07               ` Haomai Wang
  2015-05-18  1:20                 ` huang jun
  0 siblings, 1 reply; 12+ messages in thread
From: Haomai Wang @ 2015-05-16 14:07 UTC (permalink / raw)
  To: huang jun; +Cc: Steve Capper, ceph-devel, Yazen Ghannam

Maybe I'm missing something, but from your osd.log:

2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994

osd side compute the crc value from data is "0", it shouldn't happen
if we have any data bytes.

On Sat, May 16, 2015 at 7:34 PM, huang jun <hjwsm1989@gmail.com> wrote:
> we think is the client send the wrong crc value.
> we print data on OSD side, it seems ok, no data changed, but the
> calculated crc not equals the one passed from client.
>
>
> 2015-05-16 19:25 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>> On Sat, May 16, 2015 at 6:54 PM, huang jun <hjwsm1989@gmail.com> wrote:
>>> <<Even if from /dev/zero, the data crc shouldn't be 0.
>>> we print all 4M object's data crc, it seems all 0 until now.
>>> <<I guess osd(arm) doesn't do crc computing. But from code, crc for arm
>>> <<should be fine
>>> When decode a message, it will check the fron_crc, middle_crc and also data_crc,
>>> so not OSD but MON and MDS will do crc computing, and the OSD side
>>> compute the CRC value is 0, which is different with the data_crc in
>>> message footer.data_crc.
>>
>> I'm not following your meaning. The core problem is osd computes a
>> wrong crc value?
>>
>>>
>>> 2015-05-16 18:21 GMT+08:00 huang jun <hjwsm1989@gmail.com>:
>>>> that always happen, every test have such errors. And our cluster and
>>>> client that  running on X86 works fine, never seen bad crc error.
>>>>
>>>>
>>>> 2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>>>> is this always happen or occasionally?
>>>>>
>>>>> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>> hi,steve
>>>>>>
>>>>>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
>>>>>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>>>> hi,all
>>>>>>>
>>>>>>> Hi HuangJun,
>>>>>>>
>>>>>>>>
>>>>>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>>>>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>>>>>>> to write data.  On the osd side, we got bad data CRC error.
>>>>>>>>
>>>>>>>> The kclient log: (tid=6)
>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>>>>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>>>>>>> data size is 4194304
>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>>>>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>>>>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>>>>>>
>>>>>>>> The OSD-0 log:
>>>>>>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>>>>>>> mid_len 0 data_len 4194304
>>>>>>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>>>>>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>>>>>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>>>>>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>>>>>>
>>>>>>>> some considerations:
>>>>>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>>>>>>> works? or  does ceph's code has ARM branch?
>>>>>>>
>>>>>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>>>>>>> out 0.80.7 now to test.
>>>>>>> In v9.0.0, there is some code to use the ARM optional crc32c
>>>>>>> instructions, but this isn't in 0.80.7.
>>>>>>>
>>>>>>>>
>>>>>>>> 2) as we have write 125 objects, only few of them report CRC error,
>>>>>>>> and the right object's data_crc is 0 both on osd and kclient. the
>>>>>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>>>>>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>>>>>>> 0, am i right?
>>>>>>>>
>>>>>>>
>>>>>>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>>>>>>> full of zeros won't be zero.
>>>>>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>>>>>>
>>>>>>> I would like to reproduce this problem here.
>>>>>>> What steps did you take before this error occurred?
>>>>>>> Is this a cephfs filesystem or something on top of an RBD image?
>>>>>>> Which kernel are you running? Is it the one that comes with Ubuntu?
>>>>>>> (If so which package version is it?)
>>>>>>>
>>>>>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
>>>>>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
>>>>>> I'm not sure whether it's related to Memory, since we tested many
>>>>>> times, but just a few reported CRC error.
>>>>>> As i mentioned, i doubt the memory fault changed the data, because we
>>>>>> write 125 objects, and the all data_crc is 0 except the Bad CRC
>>>>>> object's data_crc. Any tips are welcome.
>>>>>>
>>>>>>> Cheers,
>>>>>>> --
>>>>>>> Steve
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> thanks
>>>>>> huangjun
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>>
>>>>> Wheat
>>>>
>>>>
>>>>
>>>> --
>>>> thanks
>>>> huangjun
>>>
>>>
>>>
>>> --
>>> thanks
>>> huangjun
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> thanks
> huangjun



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-16 14:07               ` Haomai Wang
@ 2015-05-18  1:20                 ` huang jun
  2015-05-18  7:06                   ` Haomai Wang
  0 siblings, 1 reply; 12+ messages in thread
From: huang jun @ 2015-05-18  1:20 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Steve Capper, ceph-devel, Yazen Ghannam

hi,haomai
I have tested it again, use "dd if=/dev/zero of=/mnt/test bs=4M
count=1000" on our X86 cluster, and we confirmed that, all
4194304 bytes object's data_crc is 0 both in client and OSD side.
Note that we use /dev/zero to generate data bytes.

2015-05-16 22:07 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
> Maybe I'm missing something, but from your osd.log:
>
> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>
> osd side compute the crc value from data is "0", it shouldn't happen
> if we have any data bytes.
>
> On Sat, May 16, 2015 at 7:34 PM, huang jun <hjwsm1989@gmail.com> wrote:
>> we think is the client send the wrong crc value.
>> we print data on OSD side, it seems ok, no data changed, but the
>> calculated crc not equals the one passed from client.
>>
>>
>> 2015-05-16 19:25 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>> On Sat, May 16, 2015 at 6:54 PM, huang jun <hjwsm1989@gmail.com> wrote:
>>>> <<Even if from /dev/zero, the data crc shouldn't be 0.
>>>> we print all 4M object's data crc, it seems all 0 until now.
>>>> <<I guess osd(arm) doesn't do crc computing. But from code, crc for arm
>>>> <<should be fine
>>>> When decode a message, it will check the fron_crc, middle_crc and also data_crc,
>>>> so not OSD but MON and MDS will do crc computing, and the OSD side
>>>> compute the CRC value is 0, which is different with the data_crc in
>>>> message footer.data_crc.
>>>
>>> I'm not following your meaning. The core problem is osd computes a
>>> wrong crc value?
>>>
>>>>
>>>> 2015-05-16 18:21 GMT+08:00 huang jun <hjwsm1989@gmail.com>:
>>>>> that always happen, every test have such errors. And our cluster and
>>>>> client that  running on X86 works fine, never seen bad crc error.
>>>>>
>>>>>
>>>>> 2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>>>>> is this always happen or occasionally?
>>>>>>
>>>>>> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>>> hi,steve
>>>>>>>
>>>>>>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
>>>>>>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>>>>> hi,all
>>>>>>>>
>>>>>>>> Hi HuangJun,
>>>>>>>>
>>>>>>>>>
>>>>>>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>>>>>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>>>>>>>> to write data.  On the osd side, we got bad data CRC error.
>>>>>>>>>
>>>>>>>>> The kclient log: (tid=6)
>>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>>>>>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>>>>>>>> data size is 4194304
>>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>>>>>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>>>>>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>>>>>>>
>>>>>>>>> The OSD-0 log:
>>>>>>>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>>>>>>>> mid_len 0 data_len 4194304
>>>>>>>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>>>>>>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>>>>>>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>>>>>>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>>>>>>>
>>>>>>>>> some considerations:
>>>>>>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>>>>>>>> works? or  does ceph's code has ARM branch?
>>>>>>>>
>>>>>>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>>>>>>>> out 0.80.7 now to test.
>>>>>>>> In v9.0.0, there is some code to use the ARM optional crc32c
>>>>>>>> instructions, but this isn't in 0.80.7.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2) as we have write 125 objects, only few of them report CRC error,
>>>>>>>>> and the right object's data_crc is 0 both on osd and kclient. the
>>>>>>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>>>>>>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>>>>>>>> 0, am i right?
>>>>>>>>>
>>>>>>>>
>>>>>>>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>>>>>>>> full of zeros won't be zero.
>>>>>>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>>>>>>>
>>>>>>>> I would like to reproduce this problem here.
>>>>>>>> What steps did you take before this error occurred?
>>>>>>>> Is this a cephfs filesystem or something on top of an RBD image?
>>>>>>>> Which kernel are you running? Is it the one that comes with Ubuntu?
>>>>>>>> (If so which package version is it?)
>>>>>>>>
>>>>>>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
>>>>>>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
>>>>>>> I'm not sure whether it's related to Memory, since we tested many
>>>>>>> times, but just a few reported CRC error.
>>>>>>> As i mentioned, i doubt the memory fault changed the data, because we
>>>>>>> write 125 objects, and the all data_crc is 0 except the Bad CRC
>>>>>>> object's data_crc. Any tips are welcome.
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> --
>>>>>>>> Steve
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> thanks
>>>>>>> huangjun
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>>
>>>>>> Wheat
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> thanks
>>>>> huangjun
>>>>
>>>>
>>>>
>>>> --
>>>> thanks
>>>> huangjun
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>>
>> --
>> thanks
>> huangjun
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
thanks
huangjun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-18  1:20                 ` huang jun
@ 2015-05-18  7:06                   ` Haomai Wang
  2015-05-18 10:43                     ` huang jun
  0 siblings, 1 reply; 12+ messages in thread
From: Haomai Wang @ 2015-05-18  7:06 UTC (permalink / raw)
  To: huang jun; +Cc: Steve Capper, ceph-devel, Yazen Ghannam

Oh, very sorry. I need to refresh my mind.

Yeah, if data is filled in zero and messenger code will use zero seed
then crc value is 0 !

When decode failed, do you get the data dump content?


On Mon, May 18, 2015 at 9:20 AM, huang jun <hjwsm1989@gmail.com> wrote:
> hi,haomai
> I have tested it again, use "dd if=/dev/zero of=/mnt/test bs=4M
> count=1000" on our X86 cluster, and we confirmed that, all
> 4194304 bytes object's data_crc is 0 both in client and OSD side.
> Note that we use /dev/zero to generate data bytes.
>
> 2015-05-16 22:07 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>> Maybe I'm missing something, but from your osd.log:
>>
>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>
>> osd side compute the crc value from data is "0", it shouldn't happen
>> if we have any data bytes.
>>
>> On Sat, May 16, 2015 at 7:34 PM, huang jun <hjwsm1989@gmail.com> wrote:
>>> we think is the client send the wrong crc value.
>>> we print data on OSD side, it seems ok, no data changed, but the
>>> calculated crc not equals the one passed from client.
>>>
>>>
>>> 2015-05-16 19:25 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>>> On Sat, May 16, 2015 at 6:54 PM, huang jun <hjwsm1989@gmail.com> wrote:
>>>>> <<Even if from /dev/zero, the data crc shouldn't be 0.
>>>>> we print all 4M object's data crc, it seems all 0 until now.
>>>>> <<I guess osd(arm) doesn't do crc computing. But from code, crc for arm
>>>>> <<should be fine
>>>>> When decode a message, it will check the fron_crc, middle_crc and also data_crc,
>>>>> so not OSD but MON and MDS will do crc computing, and the OSD side
>>>>> compute the CRC value is 0, which is different with the data_crc in
>>>>> message footer.data_crc.
>>>>
>>>> I'm not following your meaning. The core problem is osd computes a
>>>> wrong crc value?
>>>>
>>>>>
>>>>> 2015-05-16 18:21 GMT+08:00 huang jun <hjwsm1989@gmail.com>:
>>>>>> that always happen, every test have such errors. And our cluster and
>>>>>> client that  running on X86 works fine, never seen bad crc error.
>>>>>>
>>>>>>
>>>>>> 2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>>>>>> is this always happen or occasionally?
>>>>>>>
>>>>>>> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>>>> hi,steve
>>>>>>>>
>>>>>>>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
>>>>>>>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>>>>>> hi,all
>>>>>>>>>
>>>>>>>>> Hi HuangJun,
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>>>>>>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>>>>>>>>> to write data.  On the osd side, we got bad data CRC error.
>>>>>>>>>>
>>>>>>>>>> The kclient log: (tid=6)
>>>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>>>>>>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>>>>>>>>> data size is 4194304
>>>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>>>>>>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>>>>>>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>>>>>>>>
>>>>>>>>>> The OSD-0 log:
>>>>>>>>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>>>>>>>>> mid_len 0 data_len 4194304
>>>>>>>>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>>>>>>>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>>>>>>>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>>>>>>>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>>>>>>>>
>>>>>>>>>> some considerations:
>>>>>>>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>>>>>>>>> works? or  does ceph's code has ARM branch?
>>>>>>>>>
>>>>>>>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>>>>>>>>> out 0.80.7 now to test.
>>>>>>>>> In v9.0.0, there is some code to use the ARM optional crc32c
>>>>>>>>> instructions, but this isn't in 0.80.7.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2) as we have write 125 objects, only few of them report CRC error,
>>>>>>>>>> and the right object's data_crc is 0 both on osd and kclient. the
>>>>>>>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>>>>>>>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>>>>>>>>> 0, am i right?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>>>>>>>>> full of zeros won't be zero.
>>>>>>>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>>>>>>>>
>>>>>>>>> I would like to reproduce this problem here.
>>>>>>>>> What steps did you take before this error occurred?
>>>>>>>>> Is this a cephfs filesystem or something on top of an RBD image?
>>>>>>>>> Which kernel are you running? Is it the one that comes with Ubuntu?
>>>>>>>>> (If so which package version is it?)
>>>>>>>>>
>>>>>>>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
>>>>>>>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
>>>>>>>> I'm not sure whether it's related to Memory, since we tested many
>>>>>>>> times, but just a few reported CRC error.
>>>>>>>> As i mentioned, i doubt the memory fault changed the data, because we
>>>>>>>> write 125 objects, and the all data_crc is 0 except the Bad CRC
>>>>>>>> object's data_crc. Any tips are welcome.
>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> --
>>>>>>>>> Steve
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> thanks
>>>>>>>> huangjun
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>>
>>>>>>> Wheat
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> thanks
>>>>>> huangjun
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> thanks
>>>>> huangjun
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>>
>>>> Wheat
>>>
>>>
>>>
>>> --
>>> thanks
>>> huangjun
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> thanks
> huangjun



-- 
Best Regards,

Wheat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bad CRC in data error on ARM
  2015-05-18  7:06                   ` Haomai Wang
@ 2015-05-18 10:43                     ` huang jun
  0 siblings, 0 replies; 12+ messages in thread
From: huang jun @ 2015-05-18 10:43 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Steve Capper, ceph-devel, Yazen Ghannam

yes, OSD the data dump content seems right, since there are 4194304 bytes "0".
 it's weird that  client calculate the wrong data crc but send the
right data content to OSD.

2015-05-18 15:06 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
> Oh, very sorry. I need to refresh my mind.
>
> Yeah, if data is filled in zero and messenger code will use zero seed
> then crc value is 0 !
>
> When decode failed, do you get the data dump content?
>
>
> On Mon, May 18, 2015 at 9:20 AM, huang jun <hjwsm1989@gmail.com> wrote:
>> hi,haomai
>> I have tested it again, use "dd if=/dev/zero of=/mnt/test bs=4M
>> count=1000" on our X86 cluster, and we confirmed that, all
>> 4194304 bytes object's data_crc is 0 both in client and OSD side.
>> Note that we use /dev/zero to generate data bytes.
>>
>> 2015-05-16 22:07 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>> Maybe I'm missing something, but from your osd.log:
>>>
>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>
>>> osd side compute the crc value from data is "0", it shouldn't happen
>>> if we have any data bytes.
>>>
>>> On Sat, May 16, 2015 at 7:34 PM, huang jun <hjwsm1989@gmail.com> wrote:
>>>> we think is the client send the wrong crc value.
>>>> we print data on OSD side, it seems ok, no data changed, but the
>>>> calculated crc not equals the one passed from client.
>>>>
>>>>
>>>> 2015-05-16 19:25 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>>>> On Sat, May 16, 2015 at 6:54 PM, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>> <<Even if from /dev/zero, the data crc shouldn't be 0.
>>>>>> we print all 4M object's data crc, it seems all 0 until now.
>>>>>> <<I guess osd(arm) doesn't do crc computing. But from code, crc for arm
>>>>>> <<should be fine
>>>>>> When decode a message, it will check the fron_crc, middle_crc and also data_crc,
>>>>>> so not OSD but MON and MDS will do crc computing, and the OSD side
>>>>>> compute the CRC value is 0, which is different with the data_crc in
>>>>>> message footer.data_crc.
>>>>>
>>>>> I'm not following your meaning. The core problem is osd computes a
>>>>> wrong crc value?
>>>>>
>>>>>>
>>>>>> 2015-05-16 18:21 GMT+08:00 huang jun <hjwsm1989@gmail.com>:
>>>>>>> that always happen, every test have such errors. And our cluster and
>>>>>>> client that  running on X86 works fine, never seen bad crc error.
>>>>>>>
>>>>>>>
>>>>>>> 2015-05-16 17:30 GMT+08:00 Haomai Wang <haomaiwang@gmail.com>:
>>>>>>>> is this always happen or occasionally?
>>>>>>>>
>>>>>>>> On Sat, May 16, 2015 at 10:10 AM, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>>>>> hi,steve
>>>>>>>>>
>>>>>>>>> 2015-05-15 16:36 GMT+08:00 Steve Capper <steve.capper@linaro.org>:
>>>>>>>>>> On 15 May 2015 at 00:51, huang jun <hjwsm1989@gmail.com> wrote:
>>>>>>>>>>> hi,all
>>>>>>>>>>
>>>>>>>>>> Hi HuangJun,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We run ceph cluster on ARM platform (arm64, linux kernel 3.14, OS
>>>>>>>>>>> ubuntu 14.10), and use "dd if=/dev/zero of=/mnt/test bs=4M count=125"
>>>>>>>>>>> to write data.  On the osd side, we got bad data CRC error.
>>>>>>>>>>>
>>>>>>>>>>> The kclient log: (tid=6)
>>>>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194312] CPU[0] libceph:
>>>>>>>>>>> send_request ffffffc8d252f000 tid-6 to osd0 flags 36 pg 1.9aae829f req
>>>>>>>>>>> data size is 4194304
>>>>>>>>>>> May 14 17:21:08 node103 kernel: [  180.194316] CPU[0] libceph: tid-6
>>>>>>>>>>> ----- ffffffc0702f66c8 to osd0 42=osd_op len 197+0+4194304 -----
>>>>>>>>>>> libceph: tid-6 front_crc is 388648745 middle_crc is 0 data_crc is 3036014994
>>>>>>>>>>>
>>>>>>>>>>> The OSD-0 log:
>>>>>>>>>>> 2015-05-13 08:12:50.049345 7f378d8d8700  0 seq  3 tid 6 front_len 197
>>>>>>>>>>> mid_len 0 data_len 4194304
>>>>>>>>>>> 2015-05-13 08:12:50.049348 7f378d8d8700  0 crc in front 388648745 exp 388648745
>>>>>>>>>>> 2015-05-13 08:12:50.049395 7f378d8d8700  0 crc in middle 0 exp 0
>>>>>>>>>>> 2015-05-13 08:12:50.049964 7f378d8d8700  0 crc in data 0 exp 3036014994
>>>>>>>>>>> 2015-05-13 08:12:50.050234 7f378d8d8700  0 bad crc in data 0 != exp 3036014994
>>>>>>>>>>>
>>>>>>>>>>> some considerations:
>>>>>>>>>>> 1) we use ceph 0.80.7 realse version and compile it on ARM, did this
>>>>>>>>>>> works? or  does ceph's code has ARM branch?
>>>>>>>>>>
>>>>>>>>>> We did run a Ceph version close to that for 64-bit ARM, I'm checking
>>>>>>>>>> out 0.80.7 now to test.
>>>>>>>>>> In v9.0.0, there is some code to use the ARM optional crc32c
>>>>>>>>>> instructions, but this isn't in 0.80.7.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2) as we have write 125 objects, only few of them report CRC error,
>>>>>>>>>>> and the right object's data_crc is 0 both on osd and kclient. the
>>>>>>>>>>> wrong object's data_crc is not 0 on kclient, but osd calculate result
>>>>>>>>>>> 0. the object data came from /dev/zero, i think the data_crc should be
>>>>>>>>>>> 0, am i right?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If the initial CRC seed value is non-zero, then the CRC of a buffer
>>>>>>>>>> full of zeros won't be zero.
>>>>>>>>>> So ceph_crc32c(somethingnonzero, zerofilledbuffer, len), will be non-zero.
>>>>>>>>>>
>>>>>>>>>> I would like to reproduce this problem here.
>>>>>>>>>> What steps did you take before this error occurred?
>>>>>>>>>> Is this a cephfs filesystem or something on top of an RBD image?
>>>>>>>>>> Which kernel are you running? Is it the one that comes with Ubuntu?
>>>>>>>>>> (If so which package version is it?)
>>>>>>>>>>
>>>>>>>>> We use linux kernel version 3.14 and we just tested it on Ubuntu, and
>>>>>>>>> ceph version v0.80.7. Both cephfs and RBD image have CRC problems.
>>>>>>>>> I'm not sure whether it's related to Memory, since we tested many
>>>>>>>>> times, but just a few reported CRC error.
>>>>>>>>> As i mentioned, i doubt the memory fault changed the data, because we
>>>>>>>>> write 125 objects, and the all data_crc is 0 except the Bad CRC
>>>>>>>>> object's data_crc. Any tips are welcome.
>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> --
>>>>>>>>>> Steve
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> thanks
>>>>>>>>> huangjun
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>>
>>>>>>>> Wheat
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> thanks
>>>>>>> huangjun
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> thanks
>>>>>> huangjun
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>>
>>>>> Wheat
>>>>
>>>>
>>>>
>>>> --
>>>> thanks
>>>> huangjun
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>>
>>> Wheat
>>
>>
>>
>> --
>> thanks
>> huangjun
>
>
>
> --
> Best Regards,
>
> Wheat



-- 
thanks
huangjun

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-05-18 10:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-14 23:51 bad CRC in data error on ARM huang jun
2015-05-15  8:36 ` Steve Capper
2015-05-16  2:10   ` huang jun
2015-05-16  9:30     ` Haomai Wang
2015-05-16 10:21       ` huang jun
2015-05-16 10:54         ` huang jun
2015-05-16 11:25           ` Haomai Wang
2015-05-16 11:34             ` huang jun
2015-05-16 14:07               ` Haomai Wang
2015-05-18  1:20                 ` huang jun
2015-05-18  7:06                   ` Haomai Wang
2015-05-18 10:43                     ` huang jun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.