All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] The last part of the file is zeroed out when write N random bytes
@ 2021-09-27  7:16 Gang He
  2021-09-27  7:49 ` Joseph Qi
  0 siblings, 1 reply; 4+ messages in thread
From: Gang He @ 2021-09-27  7:16 UTC (permalink / raw)
  To: ocfs2-devel

Hi List,

I'd like to report a data loss bug when write N random bytes, since I saw there were some related commits in the past weeks.
I can reproduce this bug stably with the latest ocfs2 kernel module code as below,
1) Create a three node(e.g. ghe-tw-nd1, ghe-tw-nd2, ghe-tw-nd3) ocfs2 cluster, attach a shared disk(e.g. /dev/vdb).
2) Format the disk with the command "mkfs.ocfs2 -N 4 -b 4096 -C 1048576 /dev/vdb", and mount the disk to /mnt/shared on each node. The cluster size must be greater than 4K, this is the key to the problem.
3) Copy the file write/test scripts to /mnt/shared directory, then run test script on node1 to reproduce this bug. 
    file write script ocfs2_fallocate_bug_plain_write.py: https://pastebin.com/QsXcD8rq
    file test script ocfs2_loop.sh: https://pastebin.com/eTUe2hkW
4) Then, you can meet this bug, the file md5sum is different between from node1 and from node2.
    In fact, the last part of the file is zeroed out from node2.
    e.g.
    file dump from node1: https://pastebin.com/HB92TVS0
    file dump from node2: https://pastebin.com/jBG7HdSz

More information, 
this bug does not exist on some old kernels( e.g. linux-4.12.14-120), but it will happen on some new kernels, I feel this bug is probably NOT caused by ocfs2 commits, since I used old ocfs2 kernel module code on the new kernels, the problem also happened.
Anyway, if you have any comments, please reply this mail.
 
Thanks
Gang

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Ocfs2-devel] The last part of the file is zeroed out when write N random bytes
  2021-09-27  7:16 [Ocfs2-devel] The last part of the file is zeroed out when write N random bytes Gang He
@ 2021-09-27  7:49 ` Joseph Qi
  2021-09-27  7:57   ` Gang He
  0 siblings, 1 reply; 4+ messages in thread
From: Joseph Qi @ 2021-09-27  7:49 UTC (permalink / raw)
  To: Gang He, ocfs2-devel, Markov, Andrey, Junxiao Bi


Last week, Andrey Markov reported a similar issue, but unfortunately not
on mail list.

And Junxiao has resolved a similar issue recently. So can you reproduce
the bug in latest kernel?

Thanks,
Joseph

On 9/27/21 3:16 PM, Gang He wrote:
> Hi List,
> 
> I'd like to report a data loss bug when write N random bytes, since I saw there were some related commits in the past weeks.
> I can reproduce this bug stably with the latest ocfs2 kernel module code as below,
> 1) Create a three node(e.g. ghe-tw-nd1, ghe-tw-nd2, ghe-tw-nd3) ocfs2 cluster, attach a shared disk(e.g. /dev/vdb).
> 2) Format the disk with the command "mkfs.ocfs2 -N 4 -b 4096 -C 1048576 /dev/vdb", and mount the disk to /mnt/shared on each node. The cluster size must be greater than 4K, this is the key to the problem.
> 3) Copy the file write/test scripts to /mnt/shared directory, then run test script on node1 to reproduce this bug. 
>     file write script ocfs2_fallocate_bug_plain_write.py: https://pastebin.com/QsXcD8rq
>     file test script ocfs2_loop.sh: https://pastebin.com/eTUe2hkW
> 4) Then, you can meet this bug, the file md5sum is different between from node1 and from node2.
>     In fact, the last part of the file is zeroed out from node2.
>     e.g.
>     file dump from node1: https://pastebin.com/HB92TVS0
>     file dump from node2: https://pastebin.com/jBG7HdSz
> 
> More information, 
> this bug does not exist on some old kernels( e.g. linux-4.12.14-120), but it will happen on some new kernels, I feel this bug is probably NOT caused by ocfs2 commits, since I used old ocfs2 kernel module code on the new kernels, the problem also happened.
> Anyway, if you have any comments, please reply this mail.
>  
> Thanks
> Gang
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Ocfs2-devel] The last part of the file is zeroed out when write N random bytes
  2021-09-27  7:49 ` Joseph Qi
@ 2021-09-27  7:57   ` Gang He
  2021-09-29  0:57     ` Gang He
  0 siblings, 1 reply; 4+ messages in thread
From: Gang He @ 2021-09-27  7:57 UTC (permalink / raw)
  To: Joseph Qi, ocfs2-devel, Markov, Andrey, Junxiao Bi



On 2021/9/27 15:49, Joseph Qi wrote:
> 
> Last week, Andrey Markov reported a similar issue, but unfortunately not
> on mail list.
> 
> And Junxiao has resolved a similar issue recently. So can you reproduce
> the bug in latest kernel?
Yes, I can reproduce this issue with the latest code.
The cluster size must be greater than 4K(e.g. 8K, 1M), this is the key 
to the problem.

Thanks
Gang

> 
> Thanks,
> Joseph
> 
> On 9/27/21 3:16 PM, Gang He wrote:
>> Hi List,
>>
>> I'd like to report a data loss bug when write N random bytes, since I saw there were some related commits in the past weeks.
>> I can reproduce this bug stably with the latest ocfs2 kernel module code as below,
>> 1) Create a three node(e.g. ghe-tw-nd1, ghe-tw-nd2, ghe-tw-nd3) ocfs2 cluster, attach a shared disk(e.g. /dev/vdb).
>> 2) Format the disk with the command "mkfs.ocfs2 -N 4 -b 4096 -C 1048576 /dev/vdb", and mount the disk to /mnt/shared on each node. The cluster size must be greater than 4K, this is the key to the problem.
>> 3) Copy the file write/test scripts to /mnt/shared directory, then run test script on node1 to reproduce this bug.
>>      file write script ocfs2_fallocate_bug_plain_write.py: https://pastebin.com/QsXcD8rq
>>      file test script ocfs2_loop.sh: https://pastebin.com/eTUe2hkW
>> 4) Then, you can meet this bug, the file md5sum is different between from node1 and from node2.
>>      In fact, the last part of the file is zeroed out from node2.
>>      e.g.
>>      file dump from node1: https://pastebin.com/HB92TVS0
>>      file dump from node2: https://pastebin.com/jBG7HdSz
>>
>> More information,
>> this bug does not exist on some old kernels( e.g. linux-4.12.14-120), but it will happen on some new kernels, I feel this bug is probably NOT caused by ocfs2 commits, since I used old ocfs2 kernel module code on the new kernels, the problem also happened.
>> Anyway, if you have any comments, please reply this mail.
>>   
>> Thanks
>> Gang
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
> 


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Ocfs2-devel] The last part of the file is zeroed out when write N random bytes
  2021-09-27  7:57   ` Gang He
@ 2021-09-29  0:57     ` Gang He
  0 siblings, 0 replies; 4+ messages in thread
From: Gang He @ 2021-09-29  0:57 UTC (permalink / raw)
  To: ocfs2-devel

Hi Guys,

Just give a update.
Based on our testing, the problem was caused by the comment in fs/buffer.c.

commit 6dbf7bb555981fb5faf7b691e8f6169fc2b2e63b
Author: Jan Kara <jack@suse.cz>
Date:   Fri Sep 4 10:58:51 2020 +0200

     fs: Don't invalidate page buffers in block_write_full_page()


Thanks
Gang

On 2021/9/27 15:57, Gang He wrote:
> 
> 
> On 2021/9/27 15:49, Joseph Qi wrote:
>>
>> Last week, Andrey Markov reported a similar issue, but unfortunately not
>> on mail list.
>>
>> And Junxiao has resolved a similar issue recently. So can you reproduce
>> the bug in latest kernel?
> Yes, I can reproduce this issue with the latest code.
> The cluster size must be greater than 4K(e.g. 8K, 1M), this is the key
> to the problem.
> 
> Thanks
> Gang
> 
>>
>> Thanks,
>> Joseph
>>
>> On 9/27/21 3:16 PM, Gang He wrote:
>>> Hi List,
>>>
>>> I'd like to report a data loss bug when write N random bytes, since I saw there were some related commits in the past weeks.
>>> I can reproduce this bug stably with the latest ocfs2 kernel module code as below,
>>> 1) Create a three node(e.g. ghe-tw-nd1, ghe-tw-nd2, ghe-tw-nd3) ocfs2 cluster, attach a shared disk(e.g. /dev/vdb).
>>> 2) Format the disk with the command "mkfs.ocfs2 -N 4 -b 4096 -C 1048576 /dev/vdb", and mount the disk to /mnt/shared on each node. The cluster size must be greater than 4K, this is the key to the problem.
>>> 3) Copy the file write/test scripts to /mnt/shared directory, then run test script on node1 to reproduce this bug.
>>>       file write script ocfs2_fallocate_bug_plain_write.py: https://pastebin.com/QsXcD8rq
>>>       file test script ocfs2_loop.sh: https://pastebin.com/eTUe2hkW
>>> 4) Then, you can meet this bug, the file md5sum is different between from node1 and from node2.
>>>       In fact, the last part of the file is zeroed out from node2.
>>>       e.g.
>>>       file dump from node1: https://pastebin.com/HB92TVS0
>>>       file dump from node2: https://pastebin.com/jBG7HdSz
>>>
>>> More information,
>>> this bug does not exist on some old kernels( e.g. linux-4.12.14-120), but it will happen on some new kernels, I feel this bug is probably NOT caused by ocfs2 commits, since I used old ocfs2 kernel module code on the new kernels, the problem also happened.
>>> Anyway, if you have any comments, please reply this mail.
>>>    
>>> Thanks
>>> Gang
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel@oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-29  0:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-27  7:16 [Ocfs2-devel] The last part of the file is zeroed out when write N random bytes Gang He
2021-09-27  7:49 ` Joseph Qi
2021-09-27  7:57   ` Gang He
2021-09-29  0:57     ` Gang He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.