All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Jan Kara <jack@suse.cz>,
	ocfs2-devel@oss.oracle.com, joseph.qi@linux.alibaba.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2] ocfs2: fix data corruption by fallocate
Date: Tue, 25 May 2021 11:30:34 +0200	[thread overview]
Message-ID: <20210525093034.GB4112@quack2.suse.cz> (raw)
In-Reply-To: <479301ea-042b-855d-fc52-0d7bbdc55bdc@oracle.com>

On Mon 24-05-21 09:14:16, Junxiao Bi wrote:
> On 5/24/21 1:55 AM, Jan Kara wrote:
> 
> > On Fri 21-05-21 16:36:12, Junxiao Bi wrote:
> > > When fallocate punches holes out of inode size, if original isize is in
> > > the middle of last cluster, then the part from isize to the end of the
> > > cluster will be zeroed with buffer write, at that time isize is not
> > > yet updated to match the new size, if writeback is kicked in, it will
> > > invoke ocfs2_writepage()->block_write_full_page() where the pages out
> > > of inode size will be dropped. That will cause file corruption. Fix
> > > this by zero out eof blocks when extending the inode size.
> > > 
> > > Running the following command with qemu-image 4.2.1 can get a corrupted
> > > coverted image file easily.
> > > 
> > >      qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
> > >               -O qcow2 -o compat=1.1 $qcow_image.conv
> > > 
> > > The usage of fallocate in qemu is like this, it first punches holes out of
> > > inode size, then extend the inode size.
> > > 
> > >      fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
> > >      fallocate(11, 0, 2276196352, 65536) = 0
> > > 
> > > v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
> > > 
> > > Cc: <stable@vger.kernel.org>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> > > ---
> > > 
> > > Changes in v2:
> > > - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly.
> > > 
> > >   fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++--
> > >   1 file changed, 47 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> > > index f17c3d33fb18..17469fc7b20e 100644
> > > --- a/fs/ocfs2/file.c
> > > +++ b/fs/ocfs2/file.c
> > > @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode,
> > >   	return ret;
> > >   }
> > > +/*
> > > + * zero out partial blocks of one cluster.
> > > + *
> > > + * start: file offset where zero starts, will be made upper block aligned.
> > > + * len: it will be trimmed to the end of current cluster if "start + len"
> > > + *      is bigger than it.
> > You write this here but ...
> > 
> > > + */
> > > +static int ocfs2_zeroout_partial_cluster(struct inode *inode,
> > > +					u64 start, u64 len)
> > > +{
> > > +	int ret;
> > > +	u64 start_block, end_block, nr_blocks;
> > > +	u64 p_block, offset;
> > > +	u32 cluster, p_cluster, nr_clusters;
> > > +	struct super_block *sb = inode->i_sb;
> > > +	u64 end = ocfs2_align_bytes_to_clusters(sb, start);
> > > +
> > > +	if (start + len < end)
> > > +		end = start + len;
> > ... here you check actually something else and I don't see where else would
> > the trimming happen.
> 
> Before the "if", end = ocfs2_align_bytes_to_clusters(sb, start), that is
> the end of the cluster where "start" located.

Ah sorry, I got confused. The code is correct.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: Junxiao Bi <junxiao.bi@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
	ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix data corruption by fallocate
Date: Tue, 25 May 2021 11:30:34 +0200	[thread overview]
Message-ID: <20210525093034.GB4112@quack2.suse.cz> (raw)
In-Reply-To: <479301ea-042b-855d-fc52-0d7bbdc55bdc@oracle.com>

On Mon 24-05-21 09:14:16, Junxiao Bi wrote:
> On 5/24/21 1:55 AM, Jan Kara wrote:
> 
> > On Fri 21-05-21 16:36:12, Junxiao Bi wrote:
> > > When fallocate punches holes out of inode size, if original isize is in
> > > the middle of last cluster, then the part from isize to the end of the
> > > cluster will be zeroed with buffer write, at that time isize is not
> > > yet updated to match the new size, if writeback is kicked in, it will
> > > invoke ocfs2_writepage()->block_write_full_page() where the pages out
> > > of inode size will be dropped. That will cause file corruption. Fix
> > > this by zero out eof blocks when extending the inode size.
> > > 
> > > Running the following command with qemu-image 4.2.1 can get a corrupted
> > > coverted image file easily.
> > > 
> > >      qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
> > >               -O qcow2 -o compat=1.1 $qcow_image.conv
> > > 
> > > The usage of fallocate in qemu is like this, it first punches holes out of
> > > inode size, then extend the inode size.
> > > 
> > >      fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
> > >      fallocate(11, 0, 2276196352, 65536) = 0
> > > 
> > > v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
> > > 
> > > Cc: <stable@vger.kernel.org>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> > > ---
> > > 
> > > Changes in v2:
> > > - suggested by Jan Kara, using sb_issue_zeroout to zero eof blocks in disk directly.
> > > 
> > >   fs/ocfs2/file.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++--
> > >   1 file changed, 47 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> > > index f17c3d33fb18..17469fc7b20e 100644
> > > --- a/fs/ocfs2/file.c
> > > +++ b/fs/ocfs2/file.c
> > > @@ -1855,6 +1855,45 @@ int ocfs2_remove_inode_range(struct inode *inode,
> > >   	return ret;
> > >   }
> > > +/*
> > > + * zero out partial blocks of one cluster.
> > > + *
> > > + * start: file offset where zero starts, will be made upper block aligned.
> > > + * len: it will be trimmed to the end of current cluster if "start + len"
> > > + *      is bigger than it.
> > You write this here but ...
> > 
> > > + */
> > > +static int ocfs2_zeroout_partial_cluster(struct inode *inode,
> > > +					u64 start, u64 len)
> > > +{
> > > +	int ret;
> > > +	u64 start_block, end_block, nr_blocks;
> > > +	u64 p_block, offset;
> > > +	u32 cluster, p_cluster, nr_clusters;
> > > +	struct super_block *sb = inode->i_sb;
> > > +	u64 end = ocfs2_align_bytes_to_clusters(sb, start);
> > > +
> > > +	if (start + len < end)
> > > +		end = start + len;
> > ... here you check actually something else and I don't see where else would
> > the trimming happen.
> 
> Before the "if", end = ocfs2_align_bytes_to_clusters(sb, start), that is
> the end of the cluster where "start" located.

Ah sorry, I got confused. The code is correct.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

  reply	other threads:[~2021-05-25  9:30 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21 23:36 [PATCH v2] ocfs2: fix data corruption by fallocate Junxiao Bi
2021-05-21 23:36 ` [Ocfs2-devel] " Junxiao Bi
2021-05-23 11:52 ` Joseph Qi
2021-05-23 11:52   ` [Ocfs2-devel] " Joseph Qi
2021-05-24 16:23   ` Junxiao Bi
2021-05-24 16:23     ` [Ocfs2-devel] " Junxiao Bi
2021-05-25  2:04     ` Joseph Qi
2021-05-25  2:04       ` [Ocfs2-devel] " Joseph Qi
2021-05-25 17:58       ` Junxiao Bi
2021-05-25 17:58         ` [Ocfs2-devel] " Junxiao Bi
2021-05-26  2:11         ` Joseph Qi
2021-05-26  2:11           ` [Ocfs2-devel] " Joseph Qi
2021-05-26  5:10           ` Junxiao Bi
2021-05-26  5:10             ` [Ocfs2-devel] " Junxiao Bi
2021-05-24  8:55 ` Jan Kara
2021-05-24  8:55   ` [Ocfs2-devel] " Jan Kara
2021-05-24 16:14   ` Junxiao Bi
2021-05-24 16:14     ` [Ocfs2-devel] " Junxiao Bi
2021-05-25  9:30     ` Jan Kara [this message]
2021-05-25  9:30       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210525093034.GB4112@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=junxiao.bi@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.