On 2019/1/7 下午8:55, gius db wrote:
> Il giorno lun 7 gen 2019 alle ore 00:56 Qu Wenruo
> <quwenruo.btrfs@gmx.com> ha scritto:
> ]zac[
>>> I am quite convinced that it happens during the snapshot delete and the
>>> subsequent cleanup.
>>> And maybe even the umount is part of the problem.
>>
>> No, I mean the corruption which finally results the hang was there for a
>> long time.
>>
>> It's relatively common that extent tree get corrupted before and some
>> unfortunately operation touching the corrupted extent tree triggered
>> some user affecting error.
> 
> Yes, I understand, but the use of filesytem is very specific.
> 
> This filesystem and others that have had problems with corruption, are
> used only as backups.
> So the only operations that are performed are snapshot receive,
> snapshot create, snapshot delete.
> After the operations are finished, the filesystem is unmounted.
> 
> It may just be a coincidence, but the problems of corruptions have
> occurred very often after a snapshoot delete.

I think this should give us a pretty good clue.

Specific workload, less active usage, and normally no concurrency.

And for the backup usage, you're using relatively new kernel only, right?

Then this should be something taking into consideration for stress test.

Thanks,
Qu

> 
> The other filesystems that are used in a generic way (operating
> system, warehouse and data processing, snapshot creating etc.), have
> never given problems.
> 
> ]zac[
>>>
>>> btrfs check reported various corruptions and fixed them.
>>
>> Please paste the output if possible.
> ]zac[
> 
> Sorry, I didn't think to save the btrfs check messages.
> 
> 
> Gdb
>