On 2019/11/22 下午9:20, devel@roosoft.ltd.uk wrote: > On 22/11/2019 13:10, Qu Wenruo wrote: >> >> On 2019/11/22 下午8:37, devel@roosoft.ltd.uk wrote: >>> So been discussing this on IRC but looks like more sage advice is needed. >> You're not the only one hitting the bug. (Not sure if that makes you >> feel a little better) > > > > Hehe.. well always help to know you are not slowly going crazy by oneself. > >>> >>> The csum error is from data reloc tree, which is a tree to record the >>> new (relocated) data. >>> So the good news is, your old data is not corrupted, and since we hit >>> EIO before switching tree blocks, the corrupted data is just deleted. >>> >>> And I have also seen the bug just using single device, with DUP meta and >>> SINGLE data, so I believe there is something wrong with the data reloc tree. >>> The problem here is, I can't find a way to reproduce it, so it will take >>> us a longer time to debug. >>> >>> >>> Despite that, have you seen any other problem? Especially ENOSPC (needs >>> enospc_debug mount option). >>> The only time I hit it, I was debugging ENOSPC bug of relocation. >>> > > As far as I can tell the rest of the filesystem works normally. Like I > show scrubs clean etc.. I have not actively added much new data since > the whole point is to balance the fs so a scrub does not take 18 hours. Sorry my point here is, would you like to try balance again with "enospc_debug" mount option? As for balance, we can hit ENOSPC without showing it as long as we have a more serious problem, like the EIO you hit. > > > So really I am not sure what to do. It only seems to appear during a > balance, which as far as I know is a much needed regular maintenance > tool to keep a fs healthy, which is why it is part of the > btrfsmaintenance tools You don't need to be that nervous just for not being able to balance. Nowadays, balance is no longer that much necessary. In the old days, balance is the only way to delete empty block groups, but now empty block groups will be removed automatically, so balance is only here to address unbalanced disk usage or convert. For your case, although it's not comfortable to have imbalanced disk usages, but that won't hurt too much. So for now, you can just disable balance and call it a day. As long as you're still writing into that fs, the fs should become more and more balanced. > > Are there some other tests to try and isolate what the problem appears > to be? Forgot to mention, is that always reproducible? And always one the same block group? Thanks, Qu > > > Thanks. >