From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net ([212.227.17.22]:34361 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752524AbeEOIbF (ORCPT ); Tue, 15 May 2018 04:31:05 -0400 Subject: Re: [PATCH 1/2] btrfs: inode: Don't compress if NODATASUM or NODATACOW set To: Nikolay Borisov , Qu Wenruo , linux-btrfs@vger.kernel.org References: <20180515073622.18732-1-wqu@suse.com> <20180515073622.18732-2-wqu@suse.com> <9c4e5071-d83b-01db-2c2f-518ee1650713@suse.com> From: Qu Wenruo Message-ID: Date: Tue, 15 May 2018 16:30:54 +0800 MIME-Version: 1.0 In-Reply-To: <9c4e5071-d83b-01db-2c2f-518ee1650713@suse.com> Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2018年05月15日 16:21, Nikolay Borisov wrote: > > > On 15.05.2018 10:36, Qu Wenruo wrote: >> As btrfs(5) specified: >> >> Note >> If nodatacow or nodatasum are enabled, compression is disabled. >> >> If NODATASUM or NODATACOW set, we should not compress the extent. >> >> Normally NODATACOW is detected properly in run_delalloc_range() so >> compression won't happen for NODATACOW. >> >> However for NODATASUM we don't have any check, and it can cause >> compressed extent without csum pretty easily, just by: >> ------ >> mkfs.btrfs -f $dev >> mount $dev $mnt -o nodatasum >> touch $mnt/foobar >> mount -o remount,datasum,compress $mnt >> xfs_io -f -c "pwrite 0 128K" $mnt/foobar >> ------ >> >> And in fact, we have bug report about corrupted compressed extent >> without proper data checksum so even RAID1 can't recover the corruption. >> (https://bugzilla.kernel.org/show_bug.cgi?id=199707) >> >> Running compression without proper checksum could cause more damage when >> corruption happens, so there is no need to allow compression for >> NODATACSUM. >> >> Reported-by: James Harvey >> Signed-off-by: Qu Wenruo >> --- >> fs/btrfs/inode.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index d241285a0d2a..dbef3f404559 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) >> { >> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); >> >> + /* >> + * Btrfs doesn't support compression without csum or CoW. >> + * This should have the highest priority. >> + */ >> + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW || >> + BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) >> + return 0; >> + > > How is this not buggy, given that if inode_need_compress as called from > compress_file_range will return zero, meaning we jump to cont: label. > Then in the case of an inline extent we can execute : In that case, you won't go into compress_file_range() at all. As the only caller of compress_file_range() is async_cow_start(), which get queued in cow_file_range_async(). And cow_file_range_async() traces back to run_delalloc_range(). Here we determine (basically) where some dirty range goes. The modification in inode_need_compress() mostly affects the decision in run_delalloc_range(), so we won't go cow_file_range_async(), thus we won't hit the problem you described. > > ret = cow_file_range_inline(inode, start, end, > total_compressed, > compress_type, pages); > > where compress_type would have been set at the beginning of the > function unconditionally to fs_info->compress_type. > > For non-inline extents I guess we are ok, given that will_compress > will not be set. However, this code is rather messy and I'm not sure > it's well defined what's going to happen in this case with inline extents. > > OTOH, I think there is something fundamentally wrong in calling > inode_need_compress in compress_file_range. I.e they work at different > abstractions. IMO compress_file_range should only be called if we know > we have to compress the range. > > So looking around the code in run_delalloc_range (the only function > which calls cow_file_range_async) we already have : > > } else if (!inode_need_compress(inode, start, end)) { > ret = cow_file_range(inode, locked_page, start, end, end, > page_started, nr_written, 1, NULL); > > and in the else branch we have the cow_file_range_async. So the code > is sort of half-way there to actually decoupling compression checking from > performing the actual compression. > > >> /* force compress */ >> if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) >> return 1; > > One more thing, in inode_need_compress shouldn't the inode specific > checks come first something like : > > > static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) > { > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > > /* defrag ioctl */ > if (BTRFS_I(inode)->defrag_compress) > return 1; > /* bad compression ratios */ > if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS) > return 0; Not exactly. Force-compress should less us bypass bad compression ratio, so it should be at least before ratio check. Thanks, Qu > /* force compress */ > if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) > return 1; > if (btrfs_test_opt(fs_info, COMPRESS) || > BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS || > BTRFS_I(inode)->prop_compress) > return btrfs_compress_heuristic(inode, start, end); > return 0; > } > >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >