linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Timothy Pearson <tpearson@raptorengineering.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Unusual crash -- data rolled back ~2 weeks?
Date: Mon, 11 Nov 2019 17:33:20 -0600 (CST)	[thread overview]
Message-ID: <1405557885.533808.1573515200840.JavaMail.zimbra@raptorengineeringinc.com> (raw)
In-Reply-To: <741683181.533799.1573514917384.JavaMail.zimbra@raptorengineeringinc.com>



----- Original Message -----
> From: "Timothy Pearson" <tpearson@raptorengineering.com>
> To: "Qu Wenruo" <quwenruo.btrfs@gmx.com>
> Cc: "linux-btrfs" <linux-btrfs@vger.kernel.org>
> Sent: Monday, November 11, 2019 5:28:37 PM
> Subject: Re: Unusual crash -- data rolled back ~2 weeks?

> ----- Original Message -----
>> From: "Qu Wenruo" <quwenruo.btrfs@gmx.com>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>> Cc: "linux-btrfs" <linux-btrfs@vger.kernel.org>
>> Sent: Sunday, November 10, 2019 1:45:14 AM
>> Subject: Re: Unusual crash -- data rolled back ~2 weeks?
> 
>> On 2019/11/10 下午3:18, Timothy Pearson wrote:
>>> 
>>> 
>>> ----- Original Message -----
>>>> From: "Qu Wenruo" <quwenruo.btrfs@gmx.com>
>>>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>>>> Cc: "linux-btrfs" <linux-btrfs@vger.kernel.org>
>>>> Sent: Sunday, November 10, 2019 6:54:55 AM
>>>> Subject: Re: Unusual crash -- data rolled back ~2 weeks?
>>> 
>>>> On 2019/11/10 下午2:47, Timothy Pearson wrote:
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Qu Wenruo" <quwenruo.btrfs@gmx.com>
>>>>>> To: "Timothy Pearson" <tpearson@raptorengineering.com>, "linux-btrfs"
>>>>>> <linux-btrfs@vger.kernel.org>
>>>>>> Sent: Saturday, November 9, 2019 9:38:21 PM
>>>>>> Subject: Re: Unusual crash -- data rolled back ~2 weeks?
>>>>>
>>>>>> On 2019/11/10 上午6:33, Timothy Pearson wrote:
>>>>>>> We just experienced a very unusual crash on a Linux 5.3 file server using NFS to
>>>>>>> serve a BTRFS filesystem.  NFS went into deadlock (D wait) with no apparent
>>>>>>> underlying disk subsystem problems, and when the server was hard rebooted to
>>>>>>> clear the D wait the BTRFS filesystem remounted itself in the state that it was
>>>>>>> in approximately two weeks earlier (!).
>>>>>>
>>>>>> This means during two weeks, the btrfs is not committed.
>>>>>
>>>>> Is there any hope of getting the data from that interval back via btrfs-recover
>>>>> or a similar tool, or does the lack of commit mean the data was stored in RAM
>>>>> only and is therefore gone after the server reboot?
>>>>
>>>> If it's deadlock preventing new transaction to be committed, then no
>>>> metadata is even written back to disk, so no way to recover metadata.
>>>> Maybe you can find some data written, but without metadata it makes no
>>>> sense.
>>> 
>>> OK, I'll just assume the data written in that window is unrecoverable at this
>>> point then.
>>> 
>>> Would the commit deadlock affect only one btrfs filesystem or all of them on the
>>> machine?  I take it there is no automatic dmesg spew on extended deadlock?
>>> dmesg was completely clean at the time of the fault / reboot.
>> 
>> It should have some kernel message for things like process hang for over
>> 120s.
>> If you could recover that, it would help us to locate the cause.
>> 
>> Normally such deadlock should only affect the unlucky fs which meets the
>> condition, not all filesystems.
>> But if you're unlucky enough, it may happen to other filesystems.
>> 
>> Anyway, without enough info, it's really hard to say.
>> 
>>> 
>>>>>
>>>>> If the latter, I'm somewhat surprised given the I/O load on the disk array in
>>>>> question, but it would also offer a clue as to why it hard locked the
>>>>> filesystem eventually (presumably on memory exhaustion -- the server has
>>>>> something like 128GB of RAM, so it could go quite a while before hitting the
>>>>> physical RAM limits).
>>>>>
>>>>>>
>>>>>>>  There was also significant corruption of certain files (e.g. LDAP MDB and MySQL
>>>>>>>  InnoDB) noted -- we restored from backup for those files, but are concerned
>>>>>>>  about the status of the entire filesystem at this point.
>>>>>>
>>>>>> Btrfs check is needed to ensure no metadata corruption.
>>>>>>
>>>>>> Also, we need sysrq+w output to determine where we are deadlocking.
>>>>>> Otherwise, it's really hard to find any clue from the report.
>>>>>
>>>>> It would have been gathered if we'd known the filesystem was in this bad state.
>>>>> At the time, the priority was on restoring service and we had assumed NFS had
>>>>> just wedged itself (again).  It was only after reboot and remount that the
>>>>> damage slowly came to light.
>>>>>
>>>>> Do the described symptoms (what little we know of them at this point) line up
>>>>> with the issues fixed by https://patchwork.kernel.org/patch/11141559/ ?  Right
>>>>> now we're hoping that this particular issue was fixed by that series, but if
>>>>> not we might consider increasing backup frequency to nightly for this
>>>>> particular array and seeing if it happens again.
>>>>
>>>> That fix is already in v5.3, thus I don't think that's the case.
>>>>
>>>> Thanks,
>>>> Qu
>>> 
>>> Looking more carefully, the server in question had been booted on 5.3-rc3
>>> somehow.  It's possible that this was because earlier versions were showing
>>> driver problems with the other hardware, but somehow this machine was running
>>> 5.3-rc3 and the patch was created *after* rc3 release.
>> 
>> If that's the case, just upgrade the kernel should prevent such problem
>> from happening.
>> And it's a relief that we don't need to face another deadly deadlock.
>> 
>> Thanks,
>> Qu
> 
> Here's the final information we gleaned from the disk image -- that is now being
> archived and we're moving on from this failure.
> 
> It doesn't look like a general commit failure, it looks like somehow specific
> directories were corrupted / automatically rolled back.  Again I wonder how
> much of this is due to the online resize; needless to say, we won't be doing
> that again -- future procedure will be to isolate the existing array, format a
> new array, transfer files, then restart the services.
> 
> btrfs-find-root returned the following:
> 
> =====
> These generations showed the missing files and also contained files from after
> the crash and restart:
> Well block 114904137728(gen: 295060 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> Well block 114679480320(gen: 295059 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> Well block 114592710656(gen: 295058 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> Well block 114092670976(gen: 295057 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> Well block 114844827648(gen: 295056 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> Well block 114618925056(gen: 295055 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> Well block 923598848(gen: 294112 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> Well block 495386624(gen: 294111 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> 
> =====
> This generation failed to recover any data whatsoever:
> Well block 92602368(gen: 294008 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> 
> =====
> Generations below do not show files created after the crash and restart, but the
> directories that would have contained the ~2 weeks of files are corrupted badly
> enough that they cannot be recovered.  Lots of "leaf parent key incorrect" on
> those directories; unknown if this is because of corruption that occurred prior
> to the crash or if this data was simply overwritten after remount and file
> restore.
> 
> Well block 299955716096(gen: 293446 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> Well block 299916853248(gen: 293446 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> Well block 299787747328(gen: 293445 level: 1) seems good, but generation/level
> doesn't match, want gen: 294909 level: 1
> 
> My confidence still isn't great here that we don't have an underlying bug of
> some sort still present in btrfs, but all we can really do is keep an eye on it
> and increase backup frequency at this point.
> 
> Thanks!

For clarity, none of these roots allowed the files to be recovered.  They were simply missing from the latest generations, and the directories that would have contained them in previous generations were too badly corrupted to allow btrfs-recover to function.

  reply	other threads:[~2019-11-11 23:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-09 22:33 Unusual crash -- data rolled back ~2 weeks? Timothy Pearson
2019-11-09 22:48 ` Timothy Pearson
2019-11-10  3:38 ` Qu Wenruo
2019-11-10  6:47   ` Timothy Pearson
2019-11-10  6:54     ` Qu Wenruo
2019-11-10  7:18       ` Timothy Pearson
2019-11-10  7:45         ` Qu Wenruo
2019-11-10  7:48           ` Timothy Pearson
2019-11-10 10:02           ` Timothy Pearson
2019-11-10 20:10             ` Zygo Blaxell
2019-11-11 23:28           ` Timothy Pearson
2019-11-11 23:33             ` Timothy Pearson [this message]
2019-11-12 11:30             ` Chris Murphy
2019-11-10  8:04         ` Andrei Borzenkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1405557885.533808.1573515200840.JavaMail.zimbra@raptorengineeringinc.com \
    --to=tpearson@raptorengineering.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).