* [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-21 23:36 ` Junxiao Bi 0 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-21 23:36 UTC (permalink / raw) To: ocfs2-devel; +Cc: jack, joseph.qi, linux-fsdevel, junxiao.bi When fallocate punches holes out of inode size, if original isize is in the middle of last cluster, then the part from isize to the end of the cluster will be zeroed with buffer write, at that time isize is not yet updated to match the new size, if writeback is kicked in, it will invoke ocfs2_writepage()->block_write_full_page() where the pages out of inode size will be dropped. That will cause file corruption. Fix this by zero out eof blocks when extending the inode size. Running the following command with qemu-image 4.2.1 can get a corrupted coverted image file easily. qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ -O qcow2 -o compat=1.1 $qcow_image.conv The usage of fallocate in qemu is like this, it first punches holes out of inode size, then extend the inode size. fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 fallocate(11, 0, 2276196352, 65536) = 0 v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html Cc: <stable@vger.kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> --- Changes in v2: - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index f17c3d33fb18..17469fc7b20e 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, return ret; } +/* + * zero out partial blocks of one cluster. + * + * start: file offset where zero starts, will be made upper block aligned. + * len: it will be trimmed to the end of current cluster if "start + len" + * is bigger than it. + */ +static int ocfs2_zeroout_partial_cluster(struct inode *inode, + u64 start, u64 len) +{ + int ret; + u64 start_block, end_block, nr_blocks; + u64 p_block, offset; + u32 cluster, p_cluster, nr_clusters; + struct super_block *sb = inode->i_sb; + u64 end = ocfs2_align_bytes_to_clusters(sb, start); + + if (start + len < end) + end = start + len; + + start_block = ocfs2_blocks_for_bytes(sb, start); + end_block = ocfs2_blocks_for_bytes(sb, end); + nr_blocks = end_block - start_block; + if (!nr_blocks) + return 0; + + cluster = ocfs2_bytes_to_clusters(sb, start); + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, + &nr_clusters, NULL); + if (ret) + return ret; + if (!p_cluster) + return 0; + + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); +} + /* * Parts of this function taken from xfs_change_file_space() */ @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, { int ret; s64 llen; - loff_t size; + loff_t size, orig_isize; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); struct buffer_head *di_bh = NULL; handle_t *handle; @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, goto out_inode_unlock; } + orig_isize = i_size_read(inode); switch (sr->l_whence) { case 0: /*SEEK_SET*/ break; @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, sr->l_start += f_pos; break; case 2: /*SEEK_END*/ - sr->l_start += i_size_read(inode); + sr->l_start += orig_isize; break; default: ret = -EINVAL; @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, default: ret = -EINVAL; } + + /* zeroout eof blocks in the cluster. */ + if (!ret && change_size && orig_isize < size) + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, + size - orig_isize); up_write(&OCFS2_I(inode)->ip_alloc_sem); if (ret) { mlog_errno(ret); -- 2.24.3 (Apple Git-128) ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-21 23:36 ` Junxiao Bi 0 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-21 23:36 UTC (permalink / raw) To: ocfs2-devel; +Cc: linux-fsdevel, jack When fallocate punches holes out of inode size, if original isize is in the middle of last cluster, then the part from isize to the end of the cluster will be zeroed with buffer write, at that time isize is not yet updated to match the new size, if writeback is kicked in, it will invoke ocfs2_writepage()->block_write_full_page() where the pages out of inode size will be dropped. That will cause file corruption. Fix this by zero out eof blocks when extending the inode size. Running the following command with qemu-image 4.2.1 can get a corrupted coverted image file easily. qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ -O qcow2 -o compat=1.1 $qcow_image.conv The usage of fallocate in qemu is like this, it first punches holes out of inode size, then extend the inode size. fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 fallocate(11, 0, 2276196352, 65536) = 0 v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html Cc: <stable@vger.kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> --- Changes in v2: - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index f17c3d33fb18..17469fc7b20e 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, return ret; } +/* + * zero out partial blocks of one cluster. + * + * start: file offset where zero starts, will be made upper block aligned. + * len: it will be trimmed to the end of current cluster if "start + len" + * is bigger than it. + */ +static int ocfs2_zeroout_partial_cluster(struct inode *inode, + u64 start, u64 len) +{ + int ret; + u64 start_block, end_block, nr_blocks; + u64 p_block, offset; + u32 cluster, p_cluster, nr_clusters; + struct super_block *sb = inode->i_sb; + u64 end = ocfs2_align_bytes_to_clusters(sb, start); + + if (start + len < end) + end = start + len; + + start_block = ocfs2_blocks_for_bytes(sb, start); + end_block = ocfs2_blocks_for_bytes(sb, end); + nr_blocks = end_block - start_block; + if (!nr_blocks) + return 0; + + cluster = ocfs2_bytes_to_clusters(sb, start); + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, + &nr_clusters, NULL); + if (ret) + return ret; + if (!p_cluster) + return 0; + + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); +} + /* * Parts of this function taken from xfs_change_file_space() */ @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, { int ret; s64 llen; - loff_t size; + loff_t size, orig_isize; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); struct buffer_head *di_bh = NULL; handle_t *handle; @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, goto out_inode_unlock; } + orig_isize = i_size_read(inode); switch (sr->l_whence) { case 0: /*SEEK_SET*/ break; @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, sr->l_start += f_pos; break; case 2: /*SEEK_END*/ - sr->l_start += i_size_read(inode); + sr->l_start += orig_isize; break; default: ret = -EINVAL; @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, default: ret = -EINVAL; } + + /* zeroout eof blocks in the cluster. */ + if (!ret && change_size && orig_isize < size) + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, + size - orig_isize); up_write(&OCFS2_I(inode)->ip_alloc_sem); if (ret) { mlog_errno(ret); -- 2.24.3 (Apple Git-128) _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2] ocfs2: fix data corruption by fallocate 2021-05-21 23:36 ` [Ocfs2-devel] " Junxiao Bi @ 2021-05-23 11:52 ` Joseph Qi -1 siblings, 0 replies; 20+ messages in thread From: Joseph Qi @ 2021-05-23 11:52 UTC (permalink / raw) To: Junxiao Bi, ocfs2-devel; +Cc: jack, linux-fsdevel Hi Junxiao, If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize in __ocfs2_change_file_space(). Why do we have to zeroout first? Thanks, Joseph On 5/22/21 7:36 AM, Junxiao Bi wrote: > When fallocate punches holes out of inode size, if original isize is in > the middle of last cluster, then the part from isize to the end of the > cluster will be zeroed with buffer write, at that time isize is not > yet updated to match the new size, if writeback is kicked in, it will > invoke ocfs2_writepage()->block_write_full_page() where the pages out > of inode size will be dropped. That will cause file corruption. Fix > this by zero out eof blocks when extending the inode size. > > Running the following command with qemu-image 4.2.1 can get a corrupted > coverted image file easily. > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > -O qcow2 -o compat=1.1 $qcow_image.conv > > The usage of fallocate in qemu is like this, it first punches holes out of > inode size, then extend the inode size. > > fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 > fallocate(11, 0, 2276196352, 65536) = 0 > > v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html > > Cc: <stable@vger.kernel.org> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> > --- > > Changes in v2: > - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. > > fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 47 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index f17c3d33fb18..17469fc7b20e 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, > return ret; > } > > +/* > + * zero out partial blocks of one cluster. > + * > + * start: file offset where zero starts, will be made upper block aligned. > + * len: it will be trimmed to the end of current cluster if "start + len" > + * is bigger than it. > + */ > +static int ocfs2_zeroout_partial_cluster(struct inode *inode, > + u64 start, u64 len) > +{ > + int ret; > + u64 start_block, end_block, nr_blocks; > + u64 p_block, offset; > + u32 cluster, p_cluster, nr_clusters; > + struct super_block *sb = inode->i_sb; > + u64 end = ocfs2_align_bytes_to_clusters(sb, start); > + > + if (start + len < end) > + end = start + len; > + > + start_block = ocfs2_blocks_for_bytes(sb, start); > + end_block = ocfs2_blocks_for_bytes(sb, end); > + nr_blocks = end_block - start_block; > + if (!nr_blocks) > + return 0; > + > + cluster = ocfs2_bytes_to_clusters(sb, start); > + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, > + &nr_clusters, NULL); > + if (ret) > + return ret; > + if (!p_cluster) > + return 0; > + > + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); > + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; > + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); > +} > + > /* > * Parts of this function taken from xfs_change_file_space() > */ > @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > { > int ret; > s64 llen; > - loff_t size; > + loff_t size, orig_isize; > struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > struct buffer_head *di_bh = NULL; > handle_t *handle; > @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > goto out_inode_unlock; > } > > + orig_isize = i_size_read(inode); > switch (sr->l_whence) { > case 0: /*SEEK_SET*/ > break; > @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > sr->l_start += f_pos; > break; > case 2: /*SEEK_END*/ > - sr->l_start += i_size_read(inode); > + sr->l_start += orig_isize; > break; > default: > ret = -EINVAL; > @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > default: > ret = -EINVAL; > } > + > + /* zeroout eof blocks in the cluster. */ > + if (!ret && change_size && orig_isize < size) > + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, > + size - orig_isize); > up_write(&OCFS2_I(inode)->ip_alloc_sem); > if (ret) { > mlog_errno(ret); > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-23 11:52 ` Joseph Qi 0 siblings, 0 replies; 20+ messages in thread From: Joseph Qi @ 2021-05-23 11:52 UTC (permalink / raw) To: Junxiao Bi, ocfs2-devel; +Cc: linux-fsdevel, jack Hi Junxiao, If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize in __ocfs2_change_file_space(). Why do we have to zeroout first? Thanks, Joseph On 5/22/21 7:36 AM, Junxiao Bi wrote: > When fallocate punches holes out of inode size, if original isize is in > the middle of last cluster, then the part from isize to the end of the > cluster will be zeroed with buffer write, at that time isize is not > yet updated to match the new size, if writeback is kicked in, it will > invoke ocfs2_writepage()->block_write_full_page() where the pages out > of inode size will be dropped. That will cause file corruption. Fix > this by zero out eof blocks when extending the inode size. > > Running the following command with qemu-image 4.2.1 can get a corrupted > coverted image file easily. > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > -O qcow2 -o compat=1.1 $qcow_image.conv > > The usage of fallocate in qemu is like this, it first punches holes out of > inode size, then extend the inode size. > > fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 > fallocate(11, 0, 2276196352, 65536) = 0 > > v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html > > Cc: <stable@vger.kernel.org> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> > --- > > Changes in v2: > - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. > > fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 47 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index f17c3d33fb18..17469fc7b20e 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, > return ret; > } > > +/* > + * zero out partial blocks of one cluster. > + * > + * start: file offset where zero starts, will be made upper block aligned. > + * len: it will be trimmed to the end of current cluster if "start + len" > + * is bigger than it. > + */ > +static int ocfs2_zeroout_partial_cluster(struct inode *inode, > + u64 start, u64 len) > +{ > + int ret; > + u64 start_block, end_block, nr_blocks; > + u64 p_block, offset; > + u32 cluster, p_cluster, nr_clusters; > + struct super_block *sb = inode->i_sb; > + u64 end = ocfs2_align_bytes_to_clusters(sb, start); > + > + if (start + len < end) > + end = start + len; > + > + start_block = ocfs2_blocks_for_bytes(sb, start); > + end_block = ocfs2_blocks_for_bytes(sb, end); > + nr_blocks = end_block - start_block; > + if (!nr_blocks) > + return 0; > + > + cluster = ocfs2_bytes_to_clusters(sb, start); > + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, > + &nr_clusters, NULL); > + if (ret) > + return ret; > + if (!p_cluster) > + return 0; > + > + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); > + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; > + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); > +} > + > /* > * Parts of this function taken from xfs_change_file_space() > */ > @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > { > int ret; > s64 llen; > - loff_t size; > + loff_t size, orig_isize; > struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > struct buffer_head *di_bh = NULL; > handle_t *handle; > @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > goto out_inode_unlock; > } > > + orig_isize = i_size_read(inode); > switch (sr->l_whence) { > case 0: /*SEEK_SET*/ > break; > @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > sr->l_start += f_pos; > break; > case 2: /*SEEK_END*/ > - sr->l_start += i_size_read(inode); > + sr->l_start += orig_isize; > break; > default: > ret = -EINVAL; > @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > default: > ret = -EINVAL; > } > + > + /* zeroout eof blocks in the cluster. */ > + if (!ret && change_size && orig_isize < size) > + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, > + size - orig_isize); > up_write(&OCFS2_I(inode)->ip_alloc_sem); > if (ret) { > mlog_errno(ret); > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] ocfs2: fix data corruption by fallocate 2021-05-23 11:52 ` [Ocfs2-devel] " Joseph Qi @ 2021-05-24 16:23 ` Junxiao Bi -1 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-24 16:23 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: jack, linux-fsdevel That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. Thanks, Junxiao. On 5/23/21 4:52 AM, Joseph Qi wrote: > Hi Junxiao, > If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize > in __ocfs2_change_file_space(). Why do we have to zeroout first? > > Thanks, > Joseph > > On 5/22/21 7:36 AM, Junxiao Bi wrote: >> When fallocate punches holes out of inode size, if original isize is in >> the middle of last cluster, then the part from isize to the end of the >> cluster will be zeroed with buffer write, at that time isize is not >> yet updated to match the new size, if writeback is kicked in, it will >> invoke ocfs2_writepage()->block_write_full_page() where the pages out >> of inode size will be dropped. That will cause file corruption. Fix >> this by zero out eof blocks when extending the inode size. >> >> Running the following command with qemu-image 4.2.1 can get a corrupted >> coverted image file easily. >> >> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >> -O qcow2 -o compat=1.1 $qcow_image.conv >> >> The usage of fallocate in qemu is like this, it first punches holes out of >> inode size, then extend the inode size. >> >> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >> fallocate(11, 0, 2276196352, 65536) = 0 >> >> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >> >> Cc: <stable@vger.kernel.org> >> Cc: Jan Kara <jack@suse.cz> >> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >> --- >> >> Changes in v2: >> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >> >> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 47 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index f17c3d33fb18..17469fc7b20e 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >> return ret; >> } >> >> +/* >> + * zero out partial blocks of one cluster. >> + * >> + * start: file offset where zero starts, will be made upper block aligned. >> + * len: it will be trimmed to the end of current cluster if "start + len" >> + * is bigger than it. >> + */ >> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >> + u64 start, u64 len) >> +{ >> + int ret; >> + u64 start_block, end_block, nr_blocks; >> + u64 p_block, offset; >> + u32 cluster, p_cluster, nr_clusters; >> + struct super_block *sb = inode->i_sb; >> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >> + >> + if (start + len < end) >> + end = start + len; >> + >> + start_block = ocfs2_blocks_for_bytes(sb, start); >> + end_block = ocfs2_blocks_for_bytes(sb, end); >> + nr_blocks = end_block - start_block; >> + if (!nr_blocks) >> + return 0; >> + >> + cluster = ocfs2_bytes_to_clusters(sb, start); >> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >> + &nr_clusters, NULL); >> + if (ret) >> + return ret; >> + if (!p_cluster) >> + return 0; >> + >> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >> +} >> + >> /* >> * Parts of this function taken from xfs_change_file_space() >> */ >> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> { >> int ret; >> s64 llen; >> - loff_t size; >> + loff_t size, orig_isize; >> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> struct buffer_head *di_bh = NULL; >> handle_t *handle; >> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> goto out_inode_unlock; >> } >> >> + orig_isize = i_size_read(inode); >> switch (sr->l_whence) { >> case 0: /*SEEK_SET*/ >> break; >> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> sr->l_start += f_pos; >> break; >> case 2: /*SEEK_END*/ >> - sr->l_start += i_size_read(inode); >> + sr->l_start += orig_isize; >> break; >> default: >> ret = -EINVAL; >> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> default: >> ret = -EINVAL; >> } >> + >> + /* zeroout eof blocks in the cluster. */ >> + if (!ret && change_size && orig_isize < size) >> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >> + size - orig_isize); >> up_write(&OCFS2_I(inode)->ip_alloc_sem); >> if (ret) { >> mlog_errno(ret); >> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-24 16:23 ` Junxiao Bi 0 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-24 16:23 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: linux-fsdevel, jack That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. Thanks, Junxiao. On 5/23/21 4:52 AM, Joseph Qi wrote: > Hi Junxiao, > If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize > in __ocfs2_change_file_space(). Why do we have to zeroout first? > > Thanks, > Joseph > > On 5/22/21 7:36 AM, Junxiao Bi wrote: >> When fallocate punches holes out of inode size, if original isize is in >> the middle of last cluster, then the part from isize to the end of the >> cluster will be zeroed with buffer write, at that time isize is not >> yet updated to match the new size, if writeback is kicked in, it will >> invoke ocfs2_writepage()->block_write_full_page() where the pages out >> of inode size will be dropped. That will cause file corruption. Fix >> this by zero out eof blocks when extending the inode size. >> >> Running the following command with qemu-image 4.2.1 can get a corrupted >> coverted image file easily. >> >> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >> -O qcow2 -o compat=1.1 $qcow_image.conv >> >> The usage of fallocate in qemu is like this, it first punches holes out of >> inode size, then extend the inode size. >> >> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >> fallocate(11, 0, 2276196352, 65536) = 0 >> >> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >> >> Cc: <stable@vger.kernel.org> >> Cc: Jan Kara <jack@suse.cz> >> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >> --- >> >> Changes in v2: >> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >> >> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 47 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index f17c3d33fb18..17469fc7b20e 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >> return ret; >> } >> >> +/* >> + * zero out partial blocks of one cluster. >> + * >> + * start: file offset where zero starts, will be made upper block aligned. >> + * len: it will be trimmed to the end of current cluster if "start + len" >> + * is bigger than it. >> + */ >> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >> + u64 start, u64 len) >> +{ >> + int ret; >> + u64 start_block, end_block, nr_blocks; >> + u64 p_block, offset; >> + u32 cluster, p_cluster, nr_clusters; >> + struct super_block *sb = inode->i_sb; >> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >> + >> + if (start + len < end) >> + end = start + len; >> + >> + start_block = ocfs2_blocks_for_bytes(sb, start); >> + end_block = ocfs2_blocks_for_bytes(sb, end); >> + nr_blocks = end_block - start_block; >> + if (!nr_blocks) >> + return 0; >> + >> + cluster = ocfs2_bytes_to_clusters(sb, start); >> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >> + &nr_clusters, NULL); >> + if (ret) >> + return ret; >> + if (!p_cluster) >> + return 0; >> + >> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >> +} >> + >> /* >> * Parts of this function taken from xfs_change_file_space() >> */ >> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> { >> int ret; >> s64 llen; >> - loff_t size; >> + loff_t size, orig_isize; >> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> struct buffer_head *di_bh = NULL; >> handle_t *handle; >> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> goto out_inode_unlock; >> } >> >> + orig_isize = i_size_read(inode); >> switch (sr->l_whence) { >> case 0: /*SEEK_SET*/ >> break; >> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> sr->l_start += f_pos; >> break; >> case 2: /*SEEK_END*/ >> - sr->l_start += i_size_read(inode); >> + sr->l_start += orig_isize; >> break; >> default: >> ret = -EINVAL; >> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> default: >> ret = -EINVAL; >> } >> + >> + /* zeroout eof blocks in the cluster. */ >> + if (!ret && change_size && orig_isize < size) >> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >> + size - orig_isize); >> up_write(&OCFS2_I(inode)->ip_alloc_sem); >> if (ret) { >> mlog_errno(ret); >> _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] ocfs2: fix data corruption by fallocate 2021-05-24 16:23 ` [Ocfs2-devel] " Junxiao Bi @ 2021-05-25 2:04 ` Joseph Qi -1 siblings, 0 replies; 20+ messages in thread From: Joseph Qi @ 2021-05-25 2:04 UTC (permalink / raw) To: Junxiao Bi, ocfs2-devel; +Cc: jack, linux-fsdevel Thanks for the explanations. A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later in __ocfs2_change_file_space(). Other looks good to me. Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> On 5/25/21 12:23 AM, Junxiao Bi wrote: > That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. > > Thanks, > > Junxiao. > > On 5/23/21 4:52 AM, Joseph Qi wrote: >> Hi Junxiao, >> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize >> in __ocfs2_change_file_space(). Why do we have to zeroout first? >> >> Thanks, >> Joseph >> >> On 5/22/21 7:36 AM, Junxiao Bi wrote: >>> When fallocate punches holes out of inode size, if original isize is in >>> the middle of last cluster, then the part from isize to the end of the >>> cluster will be zeroed with buffer write, at that time isize is not >>> yet updated to match the new size, if writeback is kicked in, it will >>> invoke ocfs2_writepage()->block_write_full_page() where the pages out >>> of inode size will be dropped. That will cause file corruption. Fix >>> this by zero out eof blocks when extending the inode size. >>> >>> Running the following command with qemu-image 4.2.1 can get a corrupted >>> coverted image file easily. >>> >>> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >>> -O qcow2 -o compat=1.1 $qcow_image.conv >>> >>> The usage of fallocate in qemu is like this, it first punches holes out of >>> inode size, then extend the inode size. >>> >>> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >>> fallocate(11, 0, 2276196352, 65536) = 0 >>> >>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >>> >>> Cc: <stable@vger.kernel.org> >>> Cc: Jan Kara <jack@suse.cz> >>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >>> --- >>> >>> Changes in v2: >>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >>> >>> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >>> 1 file changed, 47 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>> index f17c3d33fb18..17469fc7b20e 100644 >>> --- a/fs/ocfs2/file.c >>> +++ b/fs/ocfs2/file.c >>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >>> return ret; >>> } >>> +/* >>> + * zero out partial blocks of one cluster. >>> + * >>> + * start: file offset where zero starts, will be made upper block aligned. >>> + * len: it will be trimmed to the end of current cluster if "start + len" >>> + * is bigger than it. >>> + */ >>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >>> + u64 start, u64 len) >>> +{ >>> + int ret; >>> + u64 start_block, end_block, nr_blocks; >>> + u64 p_block, offset; >>> + u32 cluster, p_cluster, nr_clusters; >>> + struct super_block *sb = inode->i_sb; >>> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >>> + >>> + if (start + len < end) >>> + end = start + len; >>> + >>> + start_block = ocfs2_blocks_for_bytes(sb, start); >>> + end_block = ocfs2_blocks_for_bytes(sb, end); >>> + nr_blocks = end_block - start_block; >>> + if (!nr_blocks) >>> + return 0; >>> + >>> + cluster = ocfs2_bytes_to_clusters(sb, start); >>> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >>> + &nr_clusters, NULL); >>> + if (ret) >>> + return ret; >>> + if (!p_cluster) >>> + return 0; >>> + >>> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >>> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >>> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >>> +} >>> + >>> /* >>> * Parts of this function taken from xfs_change_file_space() >>> */ >>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>> { >>> int ret; >>> s64 llen; >>> - loff_t size; >>> + loff_t size, orig_isize; >>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> struct buffer_head *di_bh = NULL; >>> handle_t *handle; >>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>> goto out_inode_unlock; >>> } >>> + orig_isize = i_size_read(inode); >>> switch (sr->l_whence) { >>> case 0: /*SEEK_SET*/ >>> break; >>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>> sr->l_start += f_pos; >>> break; >>> case 2: /*SEEK_END*/ >>> - sr->l_start += i_size_read(inode); >>> + sr->l_start += orig_isize; >>> break; >>> default: >>> ret = -EINVAL; >>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>> default: >>> ret = -EINVAL; >>> } >>> + >>> + /* zeroout eof blocks in the cluster. */ >>> + if (!ret && change_size && orig_isize < size) >>> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >>> + size - orig_isize); >>> up_write(&OCFS2_I(inode)->ip_alloc_sem); >>> if (ret) { >>> mlog_errno(ret); >>> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-25 2:04 ` Joseph Qi 0 siblings, 0 replies; 20+ messages in thread From: Joseph Qi @ 2021-05-25 2:04 UTC (permalink / raw) To: Junxiao Bi, ocfs2-devel; +Cc: linux-fsdevel, jack Thanks for the explanations. A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later in __ocfs2_change_file_space(). Other looks good to me. Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> On 5/25/21 12:23 AM, Junxiao Bi wrote: > That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. > > Thanks, > > Junxiao. > > On 5/23/21 4:52 AM, Joseph Qi wrote: >> Hi Junxiao, >> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize >> in __ocfs2_change_file_space(). Why do we have to zeroout first? >> >> Thanks, >> Joseph >> >> On 5/22/21 7:36 AM, Junxiao Bi wrote: >>> When fallocate punches holes out of inode size, if original isize is in >>> the middle of last cluster, then the part from isize to the end of the >>> cluster will be zeroed with buffer write, at that time isize is not >>> yet updated to match the new size, if writeback is kicked in, it will >>> invoke ocfs2_writepage()->block_write_full_page() where the pages out >>> of inode size will be dropped. That will cause file corruption. Fix >>> this by zero out eof blocks when extending the inode size. >>> >>> Running the following command with qemu-image 4.2.1 can get a corrupted >>> coverted image file easily. >>> >>> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >>> -O qcow2 -o compat=1.1 $qcow_image.conv >>> >>> The usage of fallocate in qemu is like this, it first punches holes out of >>> inode size, then extend the inode size. >>> >>> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >>> fallocate(11, 0, 2276196352, 65536) = 0 >>> >>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >>> >>> Cc: <stable@vger.kernel.org> >>> Cc: Jan Kara <jack@suse.cz> >>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >>> --- >>> >>> Changes in v2: >>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >>> >>> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >>> 1 file changed, 47 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>> index f17c3d33fb18..17469fc7b20e 100644 >>> --- a/fs/ocfs2/file.c >>> +++ b/fs/ocfs2/file.c >>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >>> return ret; >>> } >>> +/* >>> + * zero out partial blocks of one cluster. >>> + * >>> + * start: file offset where zero starts, will be made upper block aligned. >>> + * len: it will be trimmed to the end of current cluster if "start + len" >>> + * is bigger than it. >>> + */ >>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >>> + u64 start, u64 len) >>> +{ >>> + int ret; >>> + u64 start_block, end_block, nr_blocks; >>> + u64 p_block, offset; >>> + u32 cluster, p_cluster, nr_clusters; >>> + struct super_block *sb = inode->i_sb; >>> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >>> + >>> + if (start + len < end) >>> + end = start + len; >>> + >>> + start_block = ocfs2_blocks_for_bytes(sb, start); >>> + end_block = ocfs2_blocks_for_bytes(sb, end); >>> + nr_blocks = end_block - start_block; >>> + if (!nr_blocks) >>> + return 0; >>> + >>> + cluster = ocfs2_bytes_to_clusters(sb, start); >>> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >>> + &nr_clusters, NULL); >>> + if (ret) >>> + return ret; >>> + if (!p_cluster) >>> + return 0; >>> + >>> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >>> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >>> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >>> +} >>> + >>> /* >>> * Parts of this function taken from xfs_change_file_space() >>> */ >>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>> { >>> int ret; >>> s64 llen; >>> - loff_t size; >>> + loff_t size, orig_isize; >>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> struct buffer_head *di_bh = NULL; >>> handle_t *handle; >>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>> goto out_inode_unlock; >>> } >>> + orig_isize = i_size_read(inode); >>> switch (sr->l_whence) { >>> case 0: /*SEEK_SET*/ >>> break; >>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>> sr->l_start += f_pos; >>> break; >>> case 2: /*SEEK_END*/ >>> - sr->l_start += i_size_read(inode); >>> + sr->l_start += orig_isize; >>> break; >>> default: >>> ret = -EINVAL; >>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>> default: >>> ret = -EINVAL; >>> } >>> + >>> + /* zeroout eof blocks in the cluster. */ >>> + if (!ret && change_size && orig_isize < size) >>> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >>> + size - orig_isize); >>> up_write(&OCFS2_I(inode)->ip_alloc_sem); >>> if (ret) { >>> mlog_errno(ret); >>> _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] ocfs2: fix data corruption by fallocate 2021-05-25 2:04 ` [Ocfs2-devel] " Joseph Qi @ 2021-05-25 17:58 ` Junxiao Bi -1 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-25 17:58 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: jack, linux-fsdevel I would like make the following change to the patch, is that ok to you? diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 17469fc7b20e..775657943057 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1999,9 +1999,12 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, } /* zeroout eof blocks in the cluster. */ - if (!ret && change_size && orig_isize < size) + if (!ret && change_size && orig_isize < size) { ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, size - orig_isize); + if (!ret) + i_size_write(inode, size); + } up_write(&OCFS2_I(inode)->ip_alloc_sem); if (ret) { mlog_errno(ret); @@ -2018,9 +2021,6 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, goto out_inode_unlock; } - if (change_size && i_size_read(inode) < size) - i_size_write(inode, size); - inode->i_ctime = inode->i_mtime = current_time(inode); ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); if (ret < 0) Thanks, Junxiao. On 5/24/21 7:04 PM, Joseph Qi wrote: > Thanks for the explanations. > A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later > in __ocfs2_change_file_space(). > Other looks good to me. > Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> > > On 5/25/21 12:23 AM, Junxiao Bi wrote: >> That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. >> >> Thanks, >> >> Junxiao. >> >> On 5/23/21 4:52 AM, Joseph Qi wrote: >>> Hi Junxiao, >>> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize >>> in __ocfs2_change_file_space(). Why do we have to zeroout first? >>> >>> Thanks, >>> Joseph >>> >>> On 5/22/21 7:36 AM, Junxiao Bi wrote: >>>> When fallocate punches holes out of inode size, if original isize is in >>>> the middle of last cluster, then the part from isize to the end of the >>>> cluster will be zeroed with buffer write, at that time isize is not >>>> yet updated to match the new size, if writeback is kicked in, it will >>>> invoke ocfs2_writepage()->block_write_full_page() where the pages out >>>> of inode size will be dropped. That will cause file corruption. Fix >>>> this by zero out eof blocks when extending the inode size. >>>> >>>> Running the following command with qemu-image 4.2.1 can get a corrupted >>>> coverted image file easily. >>>> >>>> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >>>> -O qcow2 -o compat=1.1 $qcow_image.conv >>>> >>>> The usage of fallocate in qemu is like this, it first punches holes out of >>>> inode size, then extend the inode size. >>>> >>>> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >>>> fallocate(11, 0, 2276196352, 65536) = 0 >>>> >>>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >>>> >>>> Cc: <stable@vger.kernel.org> >>>> Cc: Jan Kara <jack@suse.cz> >>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >>>> --- >>>> >>>> Changes in v2: >>>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >>>> >>>> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >>>> 1 file changed, 47 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>> index f17c3d33fb18..17469fc7b20e 100644 >>>> --- a/fs/ocfs2/file.c >>>> +++ b/fs/ocfs2/file.c >>>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >>>> return ret; >>>> } >>>> +/* >>>> + * zero out partial blocks of one cluster. >>>> + * >>>> + * start: file offset where zero starts, will be made upper block aligned. >>>> + * len: it will be trimmed to the end of current cluster if "start + len" >>>> + * is bigger than it. >>>> + */ >>>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >>>> + u64 start, u64 len) >>>> +{ >>>> + int ret; >>>> + u64 start_block, end_block, nr_blocks; >>>> + u64 p_block, offset; >>>> + u32 cluster, p_cluster, nr_clusters; >>>> + struct super_block *sb = inode->i_sb; >>>> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >>>> + >>>> + if (start + len < end) >>>> + end = start + len; >>>> + >>>> + start_block = ocfs2_blocks_for_bytes(sb, start); >>>> + end_block = ocfs2_blocks_for_bytes(sb, end); >>>> + nr_blocks = end_block - start_block; >>>> + if (!nr_blocks) >>>> + return 0; >>>> + >>>> + cluster = ocfs2_bytes_to_clusters(sb, start); >>>> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >>>> + &nr_clusters, NULL); >>>> + if (ret) >>>> + return ret; >>>> + if (!p_cluster) >>>> + return 0; >>>> + >>>> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >>>> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >>>> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >>>> +} >>>> + >>>> /* >>>> * Parts of this function taken from xfs_change_file_space() >>>> */ >>>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>> { >>>> int ret; >>>> s64 llen; >>>> - loff_t size; >>>> + loff_t size, orig_isize; >>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>> struct buffer_head *di_bh = NULL; >>>> handle_t *handle; >>>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>> goto out_inode_unlock; >>>> } >>>> + orig_isize = i_size_read(inode); >>>> switch (sr->l_whence) { >>>> case 0: /*SEEK_SET*/ >>>> break; >>>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>> sr->l_start += f_pos; >>>> break; >>>> case 2: /*SEEK_END*/ >>>> - sr->l_start += i_size_read(inode); >>>> + sr->l_start += orig_isize; >>>> break; >>>> default: >>>> ret = -EINVAL; >>>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>> default: >>>> ret = -EINVAL; >>>> } >>>> + >>>> + /* zeroout eof blocks in the cluster. */ >>>> + if (!ret && change_size && orig_isize < size) >>>> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >>>> + size - orig_isize); >>>> up_write(&OCFS2_I(inode)->ip_alloc_sem); >>>> if (ret) { >>>> mlog_errno(ret); >>>> ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-25 17:58 ` Junxiao Bi 0 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-25 17:58 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: linux-fsdevel, jack I would like make the following change to the patch, is that ok to you? diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 17469fc7b20e..775657943057 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1999,9 +1999,12 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, } /* zeroout eof blocks in the cluster. */ - if (!ret && change_size && orig_isize < size) + if (!ret && change_size && orig_isize < size) { ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, size - orig_isize); + if (!ret) + i_size_write(inode, size); + } up_write(&OCFS2_I(inode)->ip_alloc_sem); if (ret) { mlog_errno(ret); @@ -2018,9 +2021,6 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, goto out_inode_unlock; } - if (change_size && i_size_read(inode) < size) - i_size_write(inode, size); - inode->i_ctime = inode->i_mtime = current_time(inode); ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); if (ret < 0) Thanks, Junxiao. On 5/24/21 7:04 PM, Joseph Qi wrote: > Thanks for the explanations. > A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later > in __ocfs2_change_file_space(). > Other looks good to me. > Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> > > On 5/25/21 12:23 AM, Junxiao Bi wrote: >> That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. >> >> Thanks, >> >> Junxiao. >> >> On 5/23/21 4:52 AM, Joseph Qi wrote: >>> Hi Junxiao, >>> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize >>> in __ocfs2_change_file_space(). Why do we have to zeroout first? >>> >>> Thanks, >>> Joseph >>> >>> On 5/22/21 7:36 AM, Junxiao Bi wrote: >>>> When fallocate punches holes out of inode size, if original isize is in >>>> the middle of last cluster, then the part from isize to the end of the >>>> cluster will be zeroed with buffer write, at that time isize is not >>>> yet updated to match the new size, if writeback is kicked in, it will >>>> invoke ocfs2_writepage()->block_write_full_page() where the pages out >>>> of inode size will be dropped. That will cause file corruption. Fix >>>> this by zero out eof blocks when extending the inode size. >>>> >>>> Running the following command with qemu-image 4.2.1 can get a corrupted >>>> coverted image file easily. >>>> >>>> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >>>> -O qcow2 -o compat=1.1 $qcow_image.conv >>>> >>>> The usage of fallocate in qemu is like this, it first punches holes out of >>>> inode size, then extend the inode size. >>>> >>>> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >>>> fallocate(11, 0, 2276196352, 65536) = 0 >>>> >>>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >>>> >>>> Cc: <stable@vger.kernel.org> >>>> Cc: Jan Kara <jack@suse.cz> >>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >>>> --- >>>> >>>> Changes in v2: >>>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >>>> >>>> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >>>> 1 file changed, 47 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>> index f17c3d33fb18..17469fc7b20e 100644 >>>> --- a/fs/ocfs2/file.c >>>> +++ b/fs/ocfs2/file.c >>>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >>>> return ret; >>>> } >>>> +/* >>>> + * zero out partial blocks of one cluster. >>>> + * >>>> + * start: file offset where zero starts, will be made upper block aligned. >>>> + * len: it will be trimmed to the end of current cluster if "start + len" >>>> + * is bigger than it. >>>> + */ >>>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >>>> + u64 start, u64 len) >>>> +{ >>>> + int ret; >>>> + u64 start_block, end_block, nr_blocks; >>>> + u64 p_block, offset; >>>> + u32 cluster, p_cluster, nr_clusters; >>>> + struct super_block *sb = inode->i_sb; >>>> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >>>> + >>>> + if (start + len < end) >>>> + end = start + len; >>>> + >>>> + start_block = ocfs2_blocks_for_bytes(sb, start); >>>> + end_block = ocfs2_blocks_for_bytes(sb, end); >>>> + nr_blocks = end_block - start_block; >>>> + if (!nr_blocks) >>>> + return 0; >>>> + >>>> + cluster = ocfs2_bytes_to_clusters(sb, start); >>>> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >>>> + &nr_clusters, NULL); >>>> + if (ret) >>>> + return ret; >>>> + if (!p_cluster) >>>> + return 0; >>>> + >>>> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >>>> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >>>> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >>>> +} >>>> + >>>> /* >>>> * Parts of this function taken from xfs_change_file_space() >>>> */ >>>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>> { >>>> int ret; >>>> s64 llen; >>>> - loff_t size; >>>> + loff_t size, orig_isize; >>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>> struct buffer_head *di_bh = NULL; >>>> handle_t *handle; >>>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>> goto out_inode_unlock; >>>> } >>>> + orig_isize = i_size_read(inode); >>>> switch (sr->l_whence) { >>>> case 0: /*SEEK_SET*/ >>>> break; >>>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>> sr->l_start += f_pos; >>>> break; >>>> case 2: /*SEEK_END*/ >>>> - sr->l_start += i_size_read(inode); >>>> + sr->l_start += orig_isize; >>>> break; >>>> default: >>>> ret = -EINVAL; >>>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>> default: >>>> ret = -EINVAL; >>>> } >>>> + >>>> + /* zeroout eof blocks in the cluster. */ >>>> + if (!ret && change_size && orig_isize < size) >>>> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >>>> + size - orig_isize); >>>> up_write(&OCFS2_I(inode)->ip_alloc_sem); >>>> if (ret) { >>>> mlog_errno(ret); >>>> _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2] ocfs2: fix data corruption by fallocate 2021-05-25 17:58 ` [Ocfs2-devel] " Junxiao Bi @ 2021-05-26 2:11 ` Joseph Qi -1 siblings, 0 replies; 20+ messages in thread From: Joseph Qi @ 2021-05-26 2:11 UTC (permalink / raw) To: Junxiao Bi, ocfs2-devel; +Cc: jack, linux-fsdevel Can we simply replace i_size_read() with 'orig_isize' and leave isize update along with other dirty inode operations? I think this makes more comfortable for the dirty inode transaction. Thanks, Joseph On 5/26/21 1:58 AM, Junxiao Bi wrote: > I would like make the following change to the patch, is that ok to you? > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index 17469fc7b20e..775657943057 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1999,9 +1999,12 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > } > > /* zeroout eof blocks in the cluster. */ > - if (!ret && change_size && orig_isize < size) > + if (!ret && change_size && orig_isize < size) { > ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, > size - orig_isize); > + if (!ret) > + i_size_write(inode, size); > + } > up_write(&OCFS2_I(inode)->ip_alloc_sem); > if (ret) { > mlog_errno(ret); > @@ -2018,9 +2021,6 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > goto out_inode_unlock; > } > > - if (change_size && i_size_read(inode) < size) > - i_size_write(inode, size); > - > inode->i_ctime = inode->i_mtime = current_time(inode); > ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); > if (ret < 0) > > Thanks, > > Junxiao. > > On 5/24/21 7:04 PM, Joseph Qi wrote: >> Thanks for the explanations. >> A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later >> in __ocfs2_change_file_space(). >> Other looks good to me. >> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> >> >> On 5/25/21 12:23 AM, Junxiao Bi wrote: >>> That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. >>> >>> Thanks, >>> >>> Junxiao. >>> >>> On 5/23/21 4:52 AM, Joseph Qi wrote: >>>> Hi Junxiao, >>>> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize >>>> in __ocfs2_change_file_space(). Why do we have to zeroout first? >>>> >>>> Thanks, >>>> Joseph >>>> >>>> On 5/22/21 7:36 AM, Junxiao Bi wrote: >>>>> When fallocate punches holes out of inode size, if original isize is in >>>>> the middle of last cluster, then the part from isize to the end of the >>>>> cluster will be zeroed with buffer write, at that time isize is not >>>>> yet updated to match the new size, if writeback is kicked in, it will >>>>> invoke ocfs2_writepage()->block_write_full_page() where the pages out >>>>> of inode size will be dropped. That will cause file corruption. Fix >>>>> this by zero out eof blocks when extending the inode size. >>>>> >>>>> Running the following command with qemu-image 4.2.1 can get a corrupted >>>>> coverted image file easily. >>>>> >>>>> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >>>>> -O qcow2 -o compat=1.1 $qcow_image.conv >>>>> >>>>> The usage of fallocate in qemu is like this, it first punches holes out of >>>>> inode size, then extend the inode size. >>>>> >>>>> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >>>>> fallocate(11, 0, 2276196352, 65536) = 0 >>>>> >>>>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >>>>> >>>>> Cc: <stable@vger.kernel.org> >>>>> Cc: Jan Kara <jack@suse.cz> >>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >>>>> --- >>>>> >>>>> Changes in v2: >>>>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >>>>> >>>>> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >>>>> 1 file changed, 47 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>>> index f17c3d33fb18..17469fc7b20e 100644 >>>>> --- a/fs/ocfs2/file.c >>>>> +++ b/fs/ocfs2/file.c >>>>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >>>>> return ret; >>>>> } >>>>> +/* >>>>> + * zero out partial blocks of one cluster. >>>>> + * >>>>> + * start: file offset where zero starts, will be made upper block aligned. >>>>> + * len: it will be trimmed to the end of current cluster if "start + len" >>>>> + * is bigger than it. >>>>> + */ >>>>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >>>>> + u64 start, u64 len) >>>>> +{ >>>>> + int ret; >>>>> + u64 start_block, end_block, nr_blocks; >>>>> + u64 p_block, offset; >>>>> + u32 cluster, p_cluster, nr_clusters; >>>>> + struct super_block *sb = inode->i_sb; >>>>> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >>>>> + >>>>> + if (start + len < end) >>>>> + end = start + len; >>>>> + >>>>> + start_block = ocfs2_blocks_for_bytes(sb, start); >>>>> + end_block = ocfs2_blocks_for_bytes(sb, end); >>>>> + nr_blocks = end_block - start_block; >>>>> + if (!nr_blocks) >>>>> + return 0; >>>>> + >>>>> + cluster = ocfs2_bytes_to_clusters(sb, start); >>>>> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >>>>> + &nr_clusters, NULL); >>>>> + if (ret) >>>>> + return ret; >>>>> + if (!p_cluster) >>>>> + return 0; >>>>> + >>>>> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >>>>> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >>>>> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >>>>> +} >>>>> + >>>>> /* >>>>> * Parts of this function taken from xfs_change_file_space() >>>>> */ >>>>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> { >>>>> int ret; >>>>> s64 llen; >>>>> - loff_t size; >>>>> + loff_t size, orig_isize; >>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>> struct buffer_head *di_bh = NULL; >>>>> handle_t *handle; >>>>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> goto out_inode_unlock; >>>>> } >>>>> + orig_isize = i_size_read(inode); >>>>> switch (sr->l_whence) { >>>>> case 0: /*SEEK_SET*/ >>>>> break; >>>>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> sr->l_start += f_pos; >>>>> break; >>>>> case 2: /*SEEK_END*/ >>>>> - sr->l_start += i_size_read(inode); >>>>> + sr->l_start += orig_isize; >>>>> break; >>>>> default: >>>>> ret = -EINVAL; >>>>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> default: >>>>> ret = -EINVAL; >>>>> } >>>>> + >>>>> + /* zeroout eof blocks in the cluster. */ >>>>> + if (!ret && change_size && orig_isize < size) >>>>> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >>>>> + size - orig_isize); >>>>> up_write(&OCFS2_I(inode)->ip_alloc_sem); >>>>> if (ret) { >>>>> mlog_errno(ret); >>>>> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-26 2:11 ` Joseph Qi 0 siblings, 0 replies; 20+ messages in thread From: Joseph Qi @ 2021-05-26 2:11 UTC (permalink / raw) To: Junxiao Bi, ocfs2-devel; +Cc: linux-fsdevel, jack Can we simply replace i_size_read() with 'orig_isize' and leave isize update along with other dirty inode operations? I think this makes more comfortable for the dirty inode transaction. Thanks, Joseph On 5/26/21 1:58 AM, Junxiao Bi wrote: > I would like make the following change to the patch, is that ok to you? > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index 17469fc7b20e..775657943057 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1999,9 +1999,12 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > } > > /* zeroout eof blocks in the cluster. */ > - if (!ret && change_size && orig_isize < size) > + if (!ret && change_size && orig_isize < size) { > ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, > size - orig_isize); > + if (!ret) > + i_size_write(inode, size); > + } > up_write(&OCFS2_I(inode)->ip_alloc_sem); > if (ret) { > mlog_errno(ret); > @@ -2018,9 +2021,6 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > goto out_inode_unlock; > } > > - if (change_size && i_size_read(inode) < size) > - i_size_write(inode, size); > - > inode->i_ctime = inode->i_mtime = current_time(inode); > ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); > if (ret < 0) > > Thanks, > > Junxiao. > > On 5/24/21 7:04 PM, Joseph Qi wrote: >> Thanks for the explanations. >> A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later >> in __ocfs2_change_file_space(). >> Other looks good to me. >> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> >> >> On 5/25/21 12:23 AM, Junxiao Bi wrote: >>> That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. >>> >>> Thanks, >>> >>> Junxiao. >>> >>> On 5/23/21 4:52 AM, Joseph Qi wrote: >>>> Hi Junxiao, >>>> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize >>>> in __ocfs2_change_file_space(). Why do we have to zeroout first? >>>> >>>> Thanks, >>>> Joseph >>>> >>>> On 5/22/21 7:36 AM, Junxiao Bi wrote: >>>>> When fallocate punches holes out of inode size, if original isize is in >>>>> the middle of last cluster, then the part from isize to the end of the >>>>> cluster will be zeroed with buffer write, at that time isize is not >>>>> yet updated to match the new size, if writeback is kicked in, it will >>>>> invoke ocfs2_writepage()->block_write_full_page() where the pages out >>>>> of inode size will be dropped. That will cause file corruption. Fix >>>>> this by zero out eof blocks when extending the inode size. >>>>> >>>>> Running the following command with qemu-image 4.2.1 can get a corrupted >>>>> coverted image file easily. >>>>> >>>>> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >>>>> -O qcow2 -o compat=1.1 $qcow_image.conv >>>>> >>>>> The usage of fallocate in qemu is like this, it first punches holes out of >>>>> inode size, then extend the inode size. >>>>> >>>>> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >>>>> fallocate(11, 0, 2276196352, 65536) = 0 >>>>> >>>>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >>>>> >>>>> Cc: <stable@vger.kernel.org> >>>>> Cc: Jan Kara <jack@suse.cz> >>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >>>>> --- >>>>> >>>>> Changes in v2: >>>>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >>>>> >>>>> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >>>>> 1 file changed, 47 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>>> index f17c3d33fb18..17469fc7b20e 100644 >>>>> --- a/fs/ocfs2/file.c >>>>> +++ b/fs/ocfs2/file.c >>>>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >>>>> return ret; >>>>> } >>>>> +/* >>>>> + * zero out partial blocks of one cluster. >>>>> + * >>>>> + * start: file offset where zero starts, will be made upper block aligned. >>>>> + * len: it will be trimmed to the end of current cluster if "start + len" >>>>> + * is bigger than it. >>>>> + */ >>>>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >>>>> + u64 start, u64 len) >>>>> +{ >>>>> + int ret; >>>>> + u64 start_block, end_block, nr_blocks; >>>>> + u64 p_block, offset; >>>>> + u32 cluster, p_cluster, nr_clusters; >>>>> + struct super_block *sb = inode->i_sb; >>>>> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >>>>> + >>>>> + if (start + len < end) >>>>> + end = start + len; >>>>> + >>>>> + start_block = ocfs2_blocks_for_bytes(sb, start); >>>>> + end_block = ocfs2_blocks_for_bytes(sb, end); >>>>> + nr_blocks = end_block - start_block; >>>>> + if (!nr_blocks) >>>>> + return 0; >>>>> + >>>>> + cluster = ocfs2_bytes_to_clusters(sb, start); >>>>> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >>>>> + &nr_clusters, NULL); >>>>> + if (ret) >>>>> + return ret; >>>>> + if (!p_cluster) >>>>> + return 0; >>>>> + >>>>> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >>>>> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >>>>> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >>>>> +} >>>>> + >>>>> /* >>>>> * Parts of this function taken from xfs_change_file_space() >>>>> */ >>>>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> { >>>>> int ret; >>>>> s64 llen; >>>>> - loff_t size; >>>>> + loff_t size, orig_isize; >>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>> struct buffer_head *di_bh = NULL; >>>>> handle_t *handle; >>>>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> goto out_inode_unlock; >>>>> } >>>>> + orig_isize = i_size_read(inode); >>>>> switch (sr->l_whence) { >>>>> case 0: /*SEEK_SET*/ >>>>> break; >>>>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> sr->l_start += f_pos; >>>>> break; >>>>> case 2: /*SEEK_END*/ >>>>> - sr->l_start += i_size_read(inode); >>>>> + sr->l_start += orig_isize; >>>>> break; >>>>> default: >>>>> ret = -EINVAL; >>>>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>> default: >>>>> ret = -EINVAL; >>>>> } >>>>> + >>>>> + /* zeroout eof blocks in the cluster. */ >>>>> + if (!ret && change_size && orig_isize < size) >>>>> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >>>>> + size - orig_isize); >>>>> up_write(&OCFS2_I(inode)->ip_alloc_sem); >>>>> if (ret) { >>>>> mlog_errno(ret); >>>>> _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] ocfs2: fix data corruption by fallocate 2021-05-26 2:11 ` [Ocfs2-devel] " Joseph Qi @ 2021-05-26 5:10 ` Junxiao Bi -1 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-26 5:10 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: jack, linux-fsdevel After moving there, i_size_write will be protected by ip_alloc_sem, ocfs2_dio_end_io_write will update i_size without holding inode lock, but it does holding ip_alloc_sem. Thanks, Junxiao. On 5/25/21 7:11 PM, Joseph Qi wrote: > Can we simply replace i_size_read() with 'orig_isize' and leave isize > update along with other dirty inode operations? > I think this makes more comfortable for the dirty inode transaction. > > Thanks, > Joseph > > On 5/26/21 1:58 AM, Junxiao Bi wrote: >> I would like make the following change to the patch, is that ok to you? >> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index 17469fc7b20e..775657943057 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -1999,9 +1999,12 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> } >> >> /* zeroout eof blocks in the cluster. */ >> - if (!ret && change_size && orig_isize < size) >> + if (!ret && change_size && orig_isize < size) { >> ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >> size - orig_isize); >> + if (!ret) >> + i_size_write(inode, size); >> + } >> up_write(&OCFS2_I(inode)->ip_alloc_sem); >> if (ret) { >> mlog_errno(ret); >> @@ -2018,9 +2021,6 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> goto out_inode_unlock; >> } >> >> - if (change_size && i_size_read(inode) < size) >> - i_size_write(inode, size); >> - >> inode->i_ctime = inode->i_mtime = current_time(inode); >> ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); >> if (ret < 0) >> >> Thanks, >> >> Junxiao. >> >> On 5/24/21 7:04 PM, Joseph Qi wrote: >>> Thanks for the explanations. >>> A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later >>> in __ocfs2_change_file_space(). >>> Other looks good to me. >>> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> >>> >>> On 5/25/21 12:23 AM, Junxiao Bi wrote: >>>> That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. >>>> >>>> Thanks, >>>> >>>> Junxiao. >>>> >>>> On 5/23/21 4:52 AM, Joseph Qi wrote: >>>>> Hi Junxiao, >>>>> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize >>>>> in __ocfs2_change_file_space(). Why do we have to zeroout first? >>>>> >>>>> Thanks, >>>>> Joseph >>>>> >>>>> On 5/22/21 7:36 AM, Junxiao Bi wrote: >>>>>> When fallocate punches holes out of inode size, if original isize is in >>>>>> the middle of last cluster, then the part from isize to the end of the >>>>>> cluster will be zeroed with buffer write, at that time isize is not >>>>>> yet updated to match the new size, if writeback is kicked in, it will >>>>>> invoke ocfs2_writepage()->block_write_full_page() where the pages out >>>>>> of inode size will be dropped. That will cause file corruption. Fix >>>>>> this by zero out eof blocks when extending the inode size. >>>>>> >>>>>> Running the following command with qemu-image 4.2.1 can get a corrupted >>>>>> coverted image file easily. >>>>>> >>>>>> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >>>>>> -O qcow2 -o compat=1.1 $qcow_image.conv >>>>>> >>>>>> The usage of fallocate in qemu is like this, it first punches holes out of >>>>>> inode size, then extend the inode size. >>>>>> >>>>>> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >>>>>> fallocate(11, 0, 2276196352, 65536) = 0 >>>>>> >>>>>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >>>>>> >>>>>> Cc: <stable@vger.kernel.org> >>>>>> Cc: Jan Kara <jack@suse.cz> >>>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >>>>>> --- >>>>>> >>>>>> Changes in v2: >>>>>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >>>>>> >>>>>> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >>>>>> 1 file changed, 47 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>>>> index f17c3d33fb18..17469fc7b20e 100644 >>>>>> --- a/fs/ocfs2/file.c >>>>>> +++ b/fs/ocfs2/file.c >>>>>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >>>>>> return ret; >>>>>> } >>>>>> +/* >>>>>> + * zero out partial blocks of one cluster. >>>>>> + * >>>>>> + * start: file offset where zero starts, will be made upper block aligned. >>>>>> + * len: it will be trimmed to the end of current cluster if "start + len" >>>>>> + * is bigger than it. >>>>>> + */ >>>>>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >>>>>> + u64 start, u64 len) >>>>>> +{ >>>>>> + int ret; >>>>>> + u64 start_block, end_block, nr_blocks; >>>>>> + u64 p_block, offset; >>>>>> + u32 cluster, p_cluster, nr_clusters; >>>>>> + struct super_block *sb = inode->i_sb; >>>>>> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >>>>>> + >>>>>> + if (start + len < end) >>>>>> + end = start + len; >>>>>> + >>>>>> + start_block = ocfs2_blocks_for_bytes(sb, start); >>>>>> + end_block = ocfs2_blocks_for_bytes(sb, end); >>>>>> + nr_blocks = end_block - start_block; >>>>>> + if (!nr_blocks) >>>>>> + return 0; >>>>>> + >>>>>> + cluster = ocfs2_bytes_to_clusters(sb, start); >>>>>> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >>>>>> + &nr_clusters, NULL); >>>>>> + if (ret) >>>>>> + return ret; >>>>>> + if (!p_cluster) >>>>>> + return 0; >>>>>> + >>>>>> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >>>>>> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >>>>>> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >>>>>> +} >>>>>> + >>>>>> /* >>>>>> * Parts of this function taken from xfs_change_file_space() >>>>>> */ >>>>>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>>> { >>>>>> int ret; >>>>>> s64 llen; >>>>>> - loff_t size; >>>>>> + loff_t size, orig_isize; >>>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>>> struct buffer_head *di_bh = NULL; >>>>>> handle_t *handle; >>>>>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>>> goto out_inode_unlock; >>>>>> } >>>>>> + orig_isize = i_size_read(inode); >>>>>> switch (sr->l_whence) { >>>>>> case 0: /*SEEK_SET*/ >>>>>> break; >>>>>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>>> sr->l_start += f_pos; >>>>>> break; >>>>>> case 2: /*SEEK_END*/ >>>>>> - sr->l_start += i_size_read(inode); >>>>>> + sr->l_start += orig_isize; >>>>>> break; >>>>>> default: >>>>>> ret = -EINVAL; >>>>>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>>> default: >>>>>> ret = -EINVAL; >>>>>> } >>>>>> + >>>>>> + /* zeroout eof blocks in the cluster. */ >>>>>> + if (!ret && change_size && orig_isize < size) >>>>>> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >>>>>> + size - orig_isize); >>>>>> up_write(&OCFS2_I(inode)->ip_alloc_sem); >>>>>> if (ret) { >>>>>> mlog_errno(ret); >>>>>> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-26 5:10 ` Junxiao Bi 0 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-26 5:10 UTC (permalink / raw) To: Joseph Qi, ocfs2-devel; +Cc: linux-fsdevel, jack After moving there, i_size_write will be protected by ip_alloc_sem, ocfs2_dio_end_io_write will update i_size without holding inode lock, but it does holding ip_alloc_sem. Thanks, Junxiao. On 5/25/21 7:11 PM, Joseph Qi wrote: > Can we simply replace i_size_read() with 'orig_isize' and leave isize > update along with other dirty inode operations? > I think this makes more comfortable for the dirty inode transaction. > > Thanks, > Joseph > > On 5/26/21 1:58 AM, Junxiao Bi wrote: >> I would like make the following change to the patch, is that ok to you? >> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index 17469fc7b20e..775657943057 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -1999,9 +1999,12 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> } >> >> /* zeroout eof blocks in the cluster. */ >> - if (!ret && change_size && orig_isize < size) >> + if (!ret && change_size && orig_isize < size) { >> ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >> size - orig_isize); >> + if (!ret) >> + i_size_write(inode, size); >> + } >> up_write(&OCFS2_I(inode)->ip_alloc_sem); >> if (ret) { >> mlog_errno(ret); >> @@ -2018,9 +2021,6 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> goto out_inode_unlock; >> } >> >> - if (change_size && i_size_read(inode) < size) >> - i_size_write(inode, size); >> - >> inode->i_ctime = inode->i_mtime = current_time(inode); >> ret = ocfs2_mark_inode_dirty(handle, inode, di_bh); >> if (ret < 0) >> >> Thanks, >> >> Junxiao. >> >> On 5/24/21 7:04 PM, Joseph Qi wrote: >>> Thanks for the explanations. >>> A tiny cleanup, we can use 'orig_isize' instead of i_size_read() later >>> in __ocfs2_change_file_space(). >>> Other looks good to me. >>> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> >>> >>> On 5/25/21 12:23 AM, Junxiao Bi wrote: >>>> That will not work, buffer write zero first, then update i_size, in between writeback could be kicked in and clear those dirty buffers because they were out of i_size. Beside that, OCFS2_IOC_RESVSP64 was never doing right job, it didn't take care eof blocks in the last cluster, that made even a simple fallocate to extend file size could cause corruption. This patch fixed both issues. >>>> >>>> Thanks, >>>> >>>> Junxiao. >>>> >>>> On 5/23/21 4:52 AM, Joseph Qi wrote: >>>>> Hi Junxiao, >>>>> If change_size is true (!FALLOC_FL_KEEP_SIZE), it will update isize >>>>> in __ocfs2_change_file_space(). Why do we have to zeroout first? >>>>> >>>>> Thanks, >>>>> Joseph >>>>> >>>>> On 5/22/21 7:36 AM, Junxiao Bi wrote: >>>>>> When fallocate punches holes out of inode size, if original isize is in >>>>>> the middle of last cluster, then the part from isize to the end of the >>>>>> cluster will be zeroed with buffer write, at that time isize is not >>>>>> yet updated to match the new size, if writeback is kicked in, it will >>>>>> invoke ocfs2_writepage()->block_write_full_page() where the pages out >>>>>> of inode size will be dropped. That will cause file corruption. Fix >>>>>> this by zero out eof blocks when extending the inode size. >>>>>> >>>>>> Running the following command with qemu-image 4.2.1 can get a corrupted >>>>>> coverted image file easily. >>>>>> >>>>>> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >>>>>> -O qcow2 -o compat=1.1 $qcow_image.conv >>>>>> >>>>>> The usage of fallocate in qemu is like this, it first punches holes out of >>>>>> inode size, then extend the inode size. >>>>>> >>>>>> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >>>>>> fallocate(11, 0, 2276196352, 65536) = 0 >>>>>> >>>>>> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >>>>>> >>>>>> Cc: <stable@vger.kernel.org> >>>>>> Cc: Jan Kara <jack@suse.cz> >>>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >>>>>> --- >>>>>> >>>>>> Changes in v2: >>>>>> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >>>>>> >>>>>> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >>>>>> 1 file changed, 47 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>>>> index f17c3d33fb18..17469fc7b20e 100644 >>>>>> --- a/fs/ocfs2/file.c >>>>>> +++ b/fs/ocfs2/file.c >>>>>> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >>>>>> return ret; >>>>>> } >>>>>> +/* >>>>>> + * zero out partial blocks of one cluster. >>>>>> + * >>>>>> + * start: file offset where zero starts, will be made upper block aligned. >>>>>> + * len: it will be trimmed to the end of current cluster if "start + len" >>>>>> + * is bigger than it. >>>>>> + */ >>>>>> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >>>>>> + u64 start, u64 len) >>>>>> +{ >>>>>> + int ret; >>>>>> + u64 start_block, end_block, nr_blocks; >>>>>> + u64 p_block, offset; >>>>>> + u32 cluster, p_cluster, nr_clusters; >>>>>> + struct super_block *sb = inode->i_sb; >>>>>> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >>>>>> + >>>>>> + if (start + len < end) >>>>>> + end = start + len; >>>>>> + >>>>>> + start_block = ocfs2_blocks_for_bytes(sb, start); >>>>>> + end_block = ocfs2_blocks_for_bytes(sb, end); >>>>>> + nr_blocks = end_block - start_block; >>>>>> + if (!nr_blocks) >>>>>> + return 0; >>>>>> + >>>>>> + cluster = ocfs2_bytes_to_clusters(sb, start); >>>>>> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >>>>>> + &nr_clusters, NULL); >>>>>> + if (ret) >>>>>> + return ret; >>>>>> + if (!p_cluster) >>>>>> + return 0; >>>>>> + >>>>>> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >>>>>> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >>>>>> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >>>>>> +} >>>>>> + >>>>>> /* >>>>>> * Parts of this function taken from xfs_change_file_space() >>>>>> */ >>>>>> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>>> { >>>>>> int ret; >>>>>> s64 llen; >>>>>> - loff_t size; >>>>>> + loff_t size, orig_isize; >>>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>>> struct buffer_head *di_bh = NULL; >>>>>> handle_t *handle; >>>>>> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>>> goto out_inode_unlock; >>>>>> } >>>>>> + orig_isize = i_size_read(inode); >>>>>> switch (sr->l_whence) { >>>>>> case 0: /*SEEK_SET*/ >>>>>> break; >>>>>> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>>> sr->l_start += f_pos; >>>>>> break; >>>>>> case 2: /*SEEK_END*/ >>>>>> - sr->l_start += i_size_read(inode); >>>>>> + sr->l_start += orig_isize; >>>>>> break; >>>>>> default: >>>>>> ret = -EINVAL; >>>>>> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >>>>>> default: >>>>>> ret = -EINVAL; >>>>>> } >>>>>> + >>>>>> + /* zeroout eof blocks in the cluster. */ >>>>>> + if (!ret && change_size && orig_isize < size) >>>>>> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >>>>>> + size - orig_isize); >>>>>> up_write(&OCFS2_I(inode)->ip_alloc_sem); >>>>>> if (ret) { >>>>>> mlog_errno(ret); >>>>>> _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] ocfs2: fix data corruption by fallocate 2021-05-21 23:36 ` [Ocfs2-devel] " Junxiao Bi @ 2021-05-24 8:55 ` Jan Kara -1 siblings, 0 replies; 20+ messages in thread From: Jan Kara @ 2021-05-24 8:55 UTC (permalink / raw) To: Junxiao Bi; +Cc: ocfs2-devel, jack, joseph.qi, linux-fsdevel On Fri 21-05-21 16:36:12, Junxiao Bi wrote: > When fallocate punches holes out of inode size, if original isize is in > the middle of last cluster, then the part from isize to the end of the > cluster will be zeroed with buffer write, at that time isize is not > yet updated to match the new size, if writeback is kicked in, it will > invoke ocfs2_writepage()->block_write_full_page() where the pages out > of inode size will be dropped. That will cause file corruption. Fix > this by zero out eof blocks when extending the inode size. > > Running the following command with qemu-image 4.2.1 can get a corrupted > coverted image file easily. > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > -O qcow2 -o compat=1.1 $qcow_image.conv > > The usage of fallocate in qemu is like this, it first punches holes out of > inode size, then extend the inode size. > > fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 > fallocate(11, 0, 2276196352, 65536) = 0 > > v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html > > Cc: <stable@vger.kernel.org> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> > --- > > Changes in v2: > - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. > > fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 47 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index f17c3d33fb18..17469fc7b20e 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, > return ret; > } > > +/* > + * zero out partial blocks of one cluster. > + * > + * start: file offset where zero starts, will be made upper block aligned. > + * len: it will be trimmed to the end of current cluster if "start + len" > + * is bigger than it. You write this here but ... > + */ > +static int ocfs2_zeroout_partial_cluster(struct inode *inode, > + u64 start, u64 len) > +{ > + int ret; > + u64 start_block, end_block, nr_blocks; > + u64 p_block, offset; > + u32 cluster, p_cluster, nr_clusters; > + struct super_block *sb = inode->i_sb; > + u64 end = ocfs2_align_bytes_to_clusters(sb, start); > + > + if (start + len < end) > + end = start + len; ... here you check actually something else and I don't see where else would the trimming happen. Honza > + > + start_block = ocfs2_blocks_for_bytes(sb, start); > + end_block = ocfs2_blocks_for_bytes(sb, end); > + nr_blocks = end_block - start_block; > + if (!nr_blocks) > + return 0; > + > + cluster = ocfs2_bytes_to_clusters(sb, start); > + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, > + &nr_clusters, NULL); > + if (ret) > + return ret; > + if (!p_cluster) > + return 0; > + > + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); > + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; > + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); > +} > + > /* > * Parts of this function taken from xfs_change_file_space() > */ > @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > { > int ret; > s64 llen; > - loff_t size; > + loff_t size, orig_isize; > struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > struct buffer_head *di_bh = NULL; > handle_t *handle; > @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > goto out_inode_unlock; > } > > + orig_isize = i_size_read(inode); > switch (sr->l_whence) { > case 0: /*SEEK_SET*/ > break; > @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > sr->l_start += f_pos; > break; > case 2: /*SEEK_END*/ > - sr->l_start += i_size_read(inode); > + sr->l_start += orig_isize; > break; > default: > ret = -EINVAL; > @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > default: > ret = -EINVAL; > } > + > + /* zeroout eof blocks in the cluster. */ > + if (!ret && change_size && orig_isize < size) > + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, > + size - orig_isize); > up_write(&OCFS2_I(inode)->ip_alloc_sem); > if (ret) { > mlog_errno(ret); > -- > 2.24.3 (Apple Git-128) > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-24 8:55 ` Jan Kara 0 siblings, 0 replies; 20+ messages in thread From: Jan Kara @ 2021-05-24 8:55 UTC (permalink / raw) To: Junxiao Bi; +Cc: linux-fsdevel, jack, ocfs2-devel On Fri 21-05-21 16:36:12, Junxiao Bi wrote: > When fallocate punches holes out of inode size, if original isize is in > the middle of last cluster, then the part from isize to the end of the > cluster will be zeroed with buffer write, at that time isize is not > yet updated to match the new size, if writeback is kicked in, it will > invoke ocfs2_writepage()->block_write_full_page() where the pages out > of inode size will be dropped. That will cause file corruption. Fix > this by zero out eof blocks when extending the inode size. > > Running the following command with qemu-image 4.2.1 can get a corrupted > coverted image file easily. > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > -O qcow2 -o compat=1.1 $qcow_image.conv > > The usage of fallocate in qemu is like this, it first punches holes out of > inode size, then extend the inode size. > > fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 > fallocate(11, 0, 2276196352, 65536) = 0 > > v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html > > Cc: <stable@vger.kernel.org> > Cc: Jan Kara <jack@suse.cz> > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> > --- > > Changes in v2: > - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. > > fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 47 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index f17c3d33fb18..17469fc7b20e 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, > return ret; > } > > +/* > + * zero out partial blocks of one cluster. > + * > + * start: file offset where zero starts, will be made upper block aligned. > + * len: it will be trimmed to the end of current cluster if "start + len" > + * is bigger than it. You write this here but ... > + */ > +static int ocfs2_zeroout_partial_cluster(struct inode *inode, > + u64 start, u64 len) > +{ > + int ret; > + u64 start_block, end_block, nr_blocks; > + u64 p_block, offset; > + u32 cluster, p_cluster, nr_clusters; > + struct super_block *sb = inode->i_sb; > + u64 end = ocfs2_align_bytes_to_clusters(sb, start); > + > + if (start + len < end) > + end = start + len; ... here you check actually something else and I don't see where else would the trimming happen. Honza > + > + start_block = ocfs2_blocks_for_bytes(sb, start); > + end_block = ocfs2_blocks_for_bytes(sb, end); > + nr_blocks = end_block - start_block; > + if (!nr_blocks) > + return 0; > + > + cluster = ocfs2_bytes_to_clusters(sb, start); > + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, > + &nr_clusters, NULL); > + if (ret) > + return ret; > + if (!p_cluster) > + return 0; > + > + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); > + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; > + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); > +} > + > /* > * Parts of this function taken from xfs_change_file_space() > */ > @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > { > int ret; > s64 llen; > - loff_t size; > + loff_t size, orig_isize; > struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > struct buffer_head *di_bh = NULL; > handle_t *handle; > @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > goto out_inode_unlock; > } > > + orig_isize = i_size_read(inode); > switch (sr->l_whence) { > case 0: /*SEEK_SET*/ > break; > @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > sr->l_start += f_pos; > break; > case 2: /*SEEK_END*/ > - sr->l_start += i_size_read(inode); > + sr->l_start += orig_isize; > break; > default: > ret = -EINVAL; > @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, > default: > ret = -EINVAL; > } > + > + /* zeroout eof blocks in the cluster. */ > + if (!ret && change_size && orig_isize < size) > + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, > + size - orig_isize); > up_write(&OCFS2_I(inode)->ip_alloc_sem); > if (ret) { > mlog_errno(ret); > -- > 2.24.3 (Apple Git-128) > -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] ocfs2: fix data corruption by fallocate 2021-05-24 8:55 ` [Ocfs2-devel] " Jan Kara @ 2021-05-24 16:14 ` Junxiao Bi -1 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-24 16:14 UTC (permalink / raw) To: Jan Kara; +Cc: ocfs2-devel, joseph.qi, linux-fsdevel On 5/24/21 1:55 AM, Jan Kara wrote: > On Fri 21-05-21 16:36:12, Junxiao Bi wrote: >> When fallocate punches holes out of inode size, if original isize is in >> the middle of last cluster, then the part from isize to the end of the >> cluster will be zeroed with buffer write, at that time isize is not >> yet updated to match the new size, if writeback is kicked in, it will >> invoke ocfs2_writepage()->block_write_full_page() where the pages out >> of inode size will be dropped. That will cause file corruption. Fix >> this by zero out eof blocks when extending the inode size. >> >> Running the following command with qemu-image 4.2.1 can get a corrupted >> coverted image file easily. >> >> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >> -O qcow2 -o compat=1.1 $qcow_image.conv >> >> The usage of fallocate in qemu is like this, it first punches holes out of >> inode size, then extend the inode size. >> >> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >> fallocate(11, 0, 2276196352, 65536) = 0 >> >> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >> >> Cc: <stable@vger.kernel.org> >> Cc: Jan Kara <jack@suse.cz> >> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >> --- >> >> Changes in v2: >> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >> >> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 47 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index f17c3d33fb18..17469fc7b20e 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >> return ret; >> } >> >> +/* >> + * zero out partial blocks of one cluster. >> + * >> + * start: file offset where zero starts, will be made upper block aligned. >> + * len: it will be trimmed to the end of current cluster if "start + len" >> + * is bigger than it. > You write this here but ... > >> + */ >> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >> + u64 start, u64 len) >> +{ >> + int ret; >> + u64 start_block, end_block, nr_blocks; >> + u64 p_block, offset; >> + u32 cluster, p_cluster, nr_clusters; >> + struct super_block *sb = inode->i_sb; >> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >> + >> + if (start + len < end) >> + end = start + len; > ... here you check actually something else and I don't see where else would > the trimming happen. Before the "if", end = ocfs2_align_bytes_to_clusters(sb, start), that is the end of the cluster where "start" located. Thanks, Junxiao. > > Honza > >> + >> + start_block = ocfs2_blocks_for_bytes(sb, start); >> + end_block = ocfs2_blocks_for_bytes(sb, end); >> + nr_blocks = end_block - start_block; >> + if (!nr_blocks) >> + return 0; >> + >> + cluster = ocfs2_bytes_to_clusters(sb, start); >> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >> + &nr_clusters, NULL); >> + if (ret) >> + return ret; >> + if (!p_cluster) >> + return 0; >> + >> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >> +} >> + >> /* >> * Parts of this function taken from xfs_change_file_space() >> */ >> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> { >> int ret; >> s64 llen; >> - loff_t size; >> + loff_t size, orig_isize; >> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> struct buffer_head *di_bh = NULL; >> handle_t *handle; >> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> goto out_inode_unlock; >> } >> >> + orig_isize = i_size_read(inode); >> switch (sr->l_whence) { >> case 0: /*SEEK_SET*/ >> break; >> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> sr->l_start += f_pos; >> break; >> case 2: /*SEEK_END*/ >> - sr->l_start += i_size_read(inode); >> + sr->l_start += orig_isize; >> break; >> default: >> ret = -EINVAL; >> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> default: >> ret = -EINVAL; >> } >> + >> + /* zeroout eof blocks in the cluster. */ >> + if (!ret && change_size && orig_isize < size) >> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >> + size - orig_isize); >> up_write(&OCFS2_I(inode)->ip_alloc_sem); >> if (ret) { >> mlog_errno(ret); >> -- >> 2.24.3 (Apple Git-128) >> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-24 16:14 ` Junxiao Bi 0 siblings, 0 replies; 20+ messages in thread From: Junxiao Bi @ 2021-05-24 16:14 UTC (permalink / raw) To: Jan Kara; +Cc: linux-fsdevel, ocfs2-devel On 5/24/21 1:55 AM, Jan Kara wrote: > On Fri 21-05-21 16:36:12, Junxiao Bi wrote: >> When fallocate punches holes out of inode size, if original isize is in >> the middle of last cluster, then the part from isize to the end of the >> cluster will be zeroed with buffer write, at that time isize is not >> yet updated to match the new size, if writeback is kicked in, it will >> invoke ocfs2_writepage()->block_write_full_page() where the pages out >> of inode size will be dropped. That will cause file corruption. Fix >> this by zero out eof blocks when extending the inode size. >> >> Running the following command with qemu-image 4.2.1 can get a corrupted >> coverted image file easily. >> >> qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ >> -O qcow2 -o compat=1.1 $qcow_image.conv >> >> The usage of fallocate in qemu is like this, it first punches holes out of >> inode size, then extend the inode size. >> >> fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 >> fallocate(11, 0, 2276196352, 65536) = 0 >> >> v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html >> >> Cc: <stable@vger.kernel.org> >> Cc: Jan Kara <jack@suse.cz> >> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> >> --- >> >> Changes in v2: >> - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. >> >> fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- >> 1 file changed, 47 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index f17c3d33fb18..17469fc7b20e 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, >> return ret; >> } >> >> +/* >> + * zero out partial blocks of one cluster. >> + * >> + * start: file offset where zero starts, will be made upper block aligned. >> + * len: it will be trimmed to the end of current cluster if "start + len" >> + * is bigger than it. > You write this here but ... > >> + */ >> +static int ocfs2_zeroout_partial_cluster(struct inode *inode, >> + u64 start, u64 len) >> +{ >> + int ret; >> + u64 start_block, end_block, nr_blocks; >> + u64 p_block, offset; >> + u32 cluster, p_cluster, nr_clusters; >> + struct super_block *sb = inode->i_sb; >> + u64 end = ocfs2_align_bytes_to_clusters(sb, start); >> + >> + if (start + len < end) >> + end = start + len; > ... here you check actually something else and I don't see where else would > the trimming happen. Before the "if", end = ocfs2_align_bytes_to_clusters(sb, start), that is the end of the cluster where "start" located. Thanks, Junxiao. > > Honza > >> + >> + start_block = ocfs2_blocks_for_bytes(sb, start); >> + end_block = ocfs2_blocks_for_bytes(sb, end); >> + nr_blocks = end_block - start_block; >> + if (!nr_blocks) >> + return 0; >> + >> + cluster = ocfs2_bytes_to_clusters(sb, start); >> + ret = ocfs2_get_clusters(inode, cluster, &p_cluster, >> + &nr_clusters, NULL); >> + if (ret) >> + return ret; >> + if (!p_cluster) >> + return 0; >> + >> + offset = start_block - ocfs2_clusters_to_blocks(sb, cluster); >> + p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset; >> + return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS); >> +} >> + >> /* >> * Parts of this function taken from xfs_change_file_space() >> */ >> @@ -1865,7 +1904,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> { >> int ret; >> s64 llen; >> - loff_t size; >> + loff_t size, orig_isize; >> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> struct buffer_head *di_bh = NULL; >> handle_t *handle; >> @@ -1896,6 +1935,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> goto out_inode_unlock; >> } >> >> + orig_isize = i_size_read(inode); >> switch (sr->l_whence) { >> case 0: /*SEEK_SET*/ >> break; >> @@ -1903,7 +1943,7 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> sr->l_start += f_pos; >> break; >> case 2: /*SEEK_END*/ >> - sr->l_start += i_size_read(inode); >> + sr->l_start += orig_isize; >> break; >> default: >> ret = -EINVAL; >> @@ -1957,6 +1997,11 @@ static int __ocfs2_change_file_space(struct file *file, struct inode *inode, >> default: >> ret = -EINVAL; >> } >> + >> + /* zeroout eof blocks in the cluster. */ >> + if (!ret && change_size && orig_isize < size) >> + ret = ocfs2_zeroout_partial_cluster(inode, orig_isize, >> + size - orig_isize); >> up_write(&OCFS2_I(inode)->ip_alloc_sem); >> if (ret) { >> mlog_errno(ret); >> -- >> 2.24.3 (Apple Git-128) >> _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2] ocfs2: fix data corruption by fallocate 2021-05-24 16:14 ` [Ocfs2-devel] " Junxiao Bi @ 2021-05-25 9:30 ` Jan Kara -1 siblings, 0 replies; 20+ messages in thread From: Jan Kara @ 2021-05-25 9:30 UTC (permalink / raw) To: Junxiao Bi; +Cc: Jan Kara, ocfs2-devel, joseph.qi, linux-fsdevel On Mon 24-05-21 09:14:16, Junxiao Bi wrote: > On 5/24/21 1:55 AM, Jan Kara wrote: > > > On Fri 21-05-21 16:36:12, Junxiao Bi wrote: > > > When fallocate punches holes out of inode size, if original isize is in > > > the middle of last cluster, then the part from isize to the end of the > > > cluster will be zeroed with buffer write, at that time isize is not > > > yet updated to match the new size, if writeback is kicked in, it will > > > invoke ocfs2_writepage()->block_write_full_page() where the pages out > > > of inode size will be dropped. That will cause file corruption. Fix > > > this by zero out eof blocks when extending the inode size. > > > > > > Running the following command with qemu-image 4.2.1 can get a corrupted > > > coverted image file easily. > > > > > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > > > -O qcow2 -o compat=1.1 $qcow_image.conv > > > > > > The usage of fallocate in qemu is like this, it first punches holes out of > > > inode size, then extend the inode size. > > > > > > fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 > > > fallocate(11, 0, 2276196352, 65536) = 0 > > > > > > v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html > > > > > > Cc: <stable@vger.kernel.org> > > > Cc: Jan Kara <jack@suse.cz> > > > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> > > > --- > > > > > > Changes in v2: > > > - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. > > > > > > fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 47 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > > > index f17c3d33fb18..17469fc7b20e 100644 > > > --- a/fs/ocfs2/file.c > > > +++ b/fs/ocfs2/file.c > > > @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, > > > return ret; > > > } > > > +/* > > > + * zero out partial blocks of one cluster. > > > + * > > > + * start: file offset where zero starts, will be made upper block aligned. > > > + * len: it will be trimmed to the end of current cluster if "start + len" > > > + * is bigger than it. > > You write this here but ... > > > > > + */ > > > +static int ocfs2_zeroout_partial_cluster(struct inode *inode, > > > + u64 start, u64 len) > > > +{ > > > + int ret; > > > + u64 start_block, end_block, nr_blocks; > > > + u64 p_block, offset; > > > + u32 cluster, p_cluster, nr_clusters; > > > + struct super_block *sb = inode->i_sb; > > > + u64 end = ocfs2_align_bytes_to_clusters(sb, start); > > > + > > > + if (start + len < end) > > > + end = start + len; > > ... here you check actually something else and I don't see where else would > > the trimming happen. > > Before the "if", end = ocfs2_align_bytes_to_clusters(sb, start), that is > the end of the cluster where "start" located. Ah sorry, I got confused. The code is correct. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate @ 2021-05-25 9:30 ` Jan Kara 0 siblings, 0 replies; 20+ messages in thread From: Jan Kara @ 2021-05-25 9:30 UTC (permalink / raw) To: Junxiao Bi; +Cc: linux-fsdevel, Jan Kara, ocfs2-devel On Mon 24-05-21 09:14:16, Junxiao Bi wrote: > On 5/24/21 1:55 AM, Jan Kara wrote: > > > On Fri 21-05-21 16:36:12, Junxiao Bi wrote: > > > When fallocate punches holes out of inode size, if original isize is in > > > the middle of last cluster, then the part from isize to the end of the > > > cluster will be zeroed with buffer write, at that time isize is not > > > yet updated to match the new size, if writeback is kicked in, it will > > > invoke ocfs2_writepage()->block_write_full_page() where the pages out > > > of inode size will be dropped. That will cause file corruption. Fix > > > this by zero out eof blocks when extending the inode size. > > > > > > Running the following command with qemu-image 4.2.1 can get a corrupted > > > coverted image file easily. > > > > > > qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ > > > -O qcow2 -o compat=1.1 $qcow_image.conv > > > > > > The usage of fallocate in qemu is like this, it first punches holes out of > > > inode size, then extend the inode size. > > > > > > fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 > > > fallocate(11, 0, 2276196352, 65536) = 0 > > > > > > v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html > > > > > > Cc: <stable@vger.kernel.org> > > > Cc: Jan Kara <jack@suse.cz> > > > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> > > > --- > > > > > > Changes in v2: > > > - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly. > > > > > > fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 47 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > > > index f17c3d33fb18..17469fc7b20e 100644 > > > --- a/fs/ocfs2/file.c > > > +++ b/fs/ocfs2/file.c > > > @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode, > > > return ret; > > > } > > > +/* > > > + * zero out partial blocks of one cluster. > > > + * > > > + * start: file offset where zero starts, will be made upper block aligned. > > > + * len: it will be trimmed to the end of current cluster if "start + len" > > > + * is bigger than it. > > You write this here but ... > > > > > + */ > > > +static int ocfs2_zeroout_partial_cluster(struct inode *inode, > > > + u64 start, u64 len) > > > +{ > > > + int ret; > > > + u64 start_block, end_block, nr_blocks; > > > + u64 p_block, offset; > > > + u32 cluster, p_cluster, nr_clusters; > > > + struct super_block *sb = inode->i_sb; > > > + u64 end = ocfs2_align_bytes_to_clusters(sb, start); > > > + > > > + if (start + len < end) > > > + end = start + len; > > ... here you check actually something else and I don't see where else would > > the trimming happen. > > Before the "if", end = ocfs2_align_bytes_to_clusters(sb, start), that is > the end of the cluster where "start" located. Ah sorry, I got confused. The code is correct. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-05-26 5:18 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-21 23:36 [PATCH v2] ocfs2: fix data corruption by fallocate Junxiao Bi 2021-05-21 23:36 ` [Ocfs2-devel] " Junxiao Bi 2021-05-23 11:52 ` Joseph Qi 2021-05-23 11:52 ` [Ocfs2-devel] " Joseph Qi 2021-05-24 16:23 ` Junxiao Bi 2021-05-24 16:23 ` [Ocfs2-devel] " Junxiao Bi 2021-05-25 2:04 ` Joseph Qi 2021-05-25 2:04 ` [Ocfs2-devel] " Joseph Qi 2021-05-25 17:58 ` Junxiao Bi 2021-05-25 17:58 ` [Ocfs2-devel] " Junxiao Bi 2021-05-26 2:11 ` Joseph Qi 2021-05-26 2:11 ` [Ocfs2-devel] " Joseph Qi 2021-05-26 5:10 ` Junxiao Bi 2021-05-26 5:10 ` [Ocfs2-devel] " Junxiao Bi 2021-05-24 8:55 ` Jan Kara 2021-05-24 8:55 ` [Ocfs2-devel] " Jan Kara 2021-05-24 16:14 ` Junxiao Bi 2021-05-24 16:14 ` [Ocfs2-devel] " Junxiao Bi 2021-05-25 9:30 ` Jan Kara 2021-05-25 9:30 ` [Ocfs2-devel] " Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.