Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Timothy Pearson <tpearson@raptorengineering.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Unusual crash -- data rolled back ~2 weeks?
Date: Sun, 10 Nov 2019 14:54:55 +0800
Message-ID: <64be1293-5845-4054-8d5f-b9ff79168a17@gmx.com> (raw)
In-Reply-To: <1848426246.125326.1573368477888.JavaMail.zimbra@raptorengineeringinc.com>

[-- Attachment #1.1: Type: text/plain, Size: 2805 bytes --]



On 2019/11/10 下午2:47, Timothy Pearson wrote:
> 
> 
> ----- Original Message -----
>> From: "Qu Wenruo" <quwenruo.btrfs@gmx.com>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>, "linux-btrfs" <linux-btrfs@vger.kernel.org>
>> Sent: Saturday, November 9, 2019 9:38:21 PM
>> Subject: Re: Unusual crash -- data rolled back ~2 weeks?
> 
>> On 2019/11/10 上午6:33, Timothy Pearson wrote:
>>> We just experienced a very unusual crash on a Linux 5.3 file server using NFS to
>>> serve a BTRFS filesystem.  NFS went into deadlock (D wait) with no apparent
>>> underlying disk subsystem problems, and when the server was hard rebooted to
>>> clear the D wait the BTRFS filesystem remounted itself in the state that it was
>>> in approximately two weeks earlier (!).
>>
>> This means during two weeks, the btrfs is not committed.
> 
> Is there any hope of getting the data from that interval back via btrfs-recover or a similar tool, or does the lack of commit mean the data was stored in RAM only and is therefore gone after the server reboot?

If it's deadlock preventing new transaction to be committed, then no
metadata is even written back to disk, so no way to recover metadata.
Maybe you can find some data written, but without metadata it makes no
sense.

> 
> If the latter, I'm somewhat surprised given the I/O load on the disk array in question, but it would also offer a clue as to why it hard locked the filesystem eventually (presumably on memory exhaustion -- the server has something like 128GB of RAM, so it could go quite a while before hitting the physical RAM limits).
> 
>>
>>>  There was also significant corruption of certain files (e.g. LDAP MDB and MySQL
>>>  InnoDB) noted -- we restored from backup for those files, but are concerned
>>>  about the status of the entire filesystem at this point.
>>
>> Btrfs check is needed to ensure no metadata corruption.
>>
>> Also, we need sysrq+w output to determine where we are deadlocking.
>> Otherwise, it's really hard to find any clue from the report.
> 
> It would have been gathered if we'd known the filesystem was in this bad state.  At the time, the priority was on restoring service and we had assumed NFS had just wedged itself (again).  It was only after reboot and remount that the damage slowly came to light.
> 
> Do the described symptoms (what little we know of them at this point) line up with the issues fixed by https://patchwork.kernel.org/patch/11141559/ ?  Right now we're hoping that this particular issue was fixed by that series, but if not we might consider increasing backup frequency to nightly for this particular array and seeing if it happens again.

That fix is already in v5.3, thus I don't think that's the case.

Thanks,
Qu

> 
> Thanks!
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply index

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-09 22:33 Timothy Pearson
2019-11-09 22:48 ` Timothy Pearson
2019-11-10  3:38 ` Qu Wenruo
2019-11-10  6:47   ` Timothy Pearson
2019-11-10  6:54     ` Qu Wenruo [this message]
2019-11-10  7:18       ` Timothy Pearson
2019-11-10  7:45         ` Qu Wenruo
2019-11-10  7:48           ` Timothy Pearson
2019-11-10 10:02           ` Timothy Pearson
2019-11-10 20:10             ` Zygo Blaxell
2019-11-11 23:28           ` Timothy Pearson
2019-11-11 23:33             ` Timothy Pearson
2019-11-12 11:30             ` Chris Murphy
2019-11-10  8:04         ` Andrei Borzenkov

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64be1293-5845-4054-8d5f-b9ff79168a17@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=tpearson@raptorengineering.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git