On 2019/6/28 下午2:56, Anand Jain wrote:
> 
> 
>> On 28 Jun 2019, at 1:58 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2019/6/28 上午10:47, Anand Jain wrote:
>>> On 27/6/19 10:58 PM, David Sterba wrote:
>>>> On Tue, Jun 25, 2019 at 04:24:57PM +0800, Qu Wenruo wrote:
>>>>> Ping?
>>>>>
>>>>> This patch should fix the problem of compressed extent even when
>>>>> nodatasum is set.
>>>>>
>>>>> It has been one year but we still didn't get a conclusion on where
>>>>> force_compress should behave.
>>>>
>>>> Note that pings to patches sent year ago will get lost, I noticed only
>>>> because you resent it and I remembered that we had some discussions,
>>>> without conclusions.
>>>>
>>>>> But at least to me, NODATASUM is a strong exclusion for compress, no
>>>>> matter whatever option we use, we should NEVER compress data without
>>>>> datasum/datacow.
>>>>
>>>> That's correct, 
>>>
>>>  But I wonder what's the reason that datasum/datacow is prerequisite for
>>> the compression ?
>>
>> It's easy to understand the hard requirement for data COW.
>>
>> If you overwrite compressed extent, a powerloss before transaction
>> commit could easily screw up the data.
> 
>  This scenario is even true for non compressed data, right?

Not exactly.

For non-compressed data, it's either:
1) NODATACOW (implies NODATASUM)
   Then we don't know if the new data is correct or not.
   But we still can READ it out, and let user space to determine.

2) DATACOW (no matter if it has data sum or not)
   The old data is not touched at all. So nothing but the uncommitted
   data is lost.

But for compressed NODATACOW data, it's more serious as:
We CANNOT read the data out, as the on-disk data must follow
compression schema, thus even we only overwrite the beginning 4K of a
16K compressed 128K uncompressed extent, the whole 128K extent can't be
read out.

So it's way more serious than non-compressed extent.

> 
>> Furthermore, what will happen if you're overwriting a 16K data extent
>> while its original compressed size is only 4K, while the new data after
>> compression is 8K?
> 
>  Sorry I can’t think of any limitation, any idea?

We don't know how the compressed extent size is until we compress it.

So how do you know whether we should CoW or not at the delalloc time if
we allow overwrite compressed extent?

> 
>> For checksum, I'd say it's not as a hard requirement as data cow, but
>> still a very preferred one.
>>
>> Since compressed data corruption could cause more problem, e.g. one bit
>> corruption can cause the whole extent to be corrupted, it's highly
>> recommended to have checksum to protect them.
> 
>  Isn’t it true that compression boundary/granularity is an extent,
>  so the corrupted extent remains corrupted the same way after the
>  decompress,

For corrupted compressed extent, it may not pass decompress.

> and corruption won’t perpetuate to the other extents
>  in the file. But can a non-impending corruption due to external
>  factors be the reason for datasum perrequisite? it can’t be IMO.

The corruption boundary is the WHOLE extent, not just where the
corruption is.
That's the point, and that's why it's more serious than plaintext extent
corruption, and why checksum is highly recommended.

Thanks,
Qu

> 
> Thanks, Anand
> 
>> Thanks,
>> Qu
>>>
>>> Thanks, Anand
>>>
>>>> but the way you fix it is IMO not right. This was also
>>>> noticed by Nikolay, that there are 2 locations that call
>>>> inode_need_compress but with different semantics.
>>>>
>>>> One is the decision if compression applies at all, and the second one
>>>> when that's certain it's compression, to do it or not based on the
>>>> status decision of eg. heuristics.
>>>>
>>>
>>
>