On 2019/11/10 上午6:33, Timothy Pearson wrote:
> We just experienced a very unusual crash on a Linux 5.3 file server using NFS to serve a BTRFS filesystem.  NFS went into deadlock (D wait) with no apparent underlying disk subsystem problems, and when the server was hard rebooted to clear the D wait the BTRFS filesystem remounted itself in the state that it was in approximately two weeks earlier (!).

This means during two weeks, the btrfs is not committed.

>  There was also significant corruption of certain files (e.g. LDAP MDB and MySQL InnoDB) noted -- we restored from backup for those files, but are concerned about the status of the entire filesystem at this point.

Btrfs check is needed to ensure no metadata corruption.

Also, we need sysrq+w output to determine where we are deadlocking.
Otherwise, it's really hard to find any clue from the report.

Thanks,
Qu

> 
> We do not use subvolumes, snapshots, or any of the advanced features of BTRFS beyond the data checksumming.  I am at a loss as to how BTRFS could suddenly just "forget" about the past two weeks of written data and (mostly) cleanly roll back on the next mount without even throwing any warnings in dmesg.
> 
> Any thoughts on how this is possible, and if there is any chance of getting the lost couple weeks of data back, would be appreciated.
> 
> Thank you!
>