All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Nick Bowler <nbowler@draconx.ca>
Cc: Chris Murphy <lists@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>,
	Filipe Manana <fdmanana@kernel.org>
Subject: Re: Btrfs filesystem trashed after OOM scenario
Date: Thu, 26 Sep 2019 07:26:42 -0400	[thread overview]
Message-ID: <5c94f84b-a3ed-7f67-0178-e531895c5128@gmail.com> (raw)
In-Reply-To: <CADyTPEw=g7y+DroBt+CO-=8T3=8kO5Muj6Ts3LrkwDtKx2=zcQ@mail.gmail.com>

On 2019-09-25 00:25, Nick Bowler wrote:
> On Tue, Sep 24, 2019, 18:34 Chris Murphy, <lists@colorremedies.com> wrote:
>> On Tue, Sep 24, 2019 at 4:04 PM Nick Bowler <nbowler@draconx.ca> wrote:
>>> - Running Linux 5.2.14, I pushed this system to OOM; the oom killer
>>> ran and killed some userspace tasks.  At this point many of the
>>> remaining tasks were stuck in uninterruptible sleeps.  Not really
>>> worried, I turned the machine off and on again to just get everything
>>> back to normal.  But I guess now that everything had gone horribly
>>> wrong already at this point...
>>
>> Yeah the kernel oomkiller is pretty much only about kernel
>> preservation, not user space preservation.
> 
> Indeed I am not bothered at all by needing to turn it off and on again
> in this situation.  But filesystems being completely trashed is
> another matter...
> 
>>> - Upon reboot, the system boots OK but now btrfs is throwing zillions
>>> of checksum errors.  After some time the filesystem is remounted
>>> readonly and I lose the ability to interact with the system at all, so
>>> it gets powered off.
>>>
>>> - Now the filesystem is unmountable.
>>
>> The transid errors look like they might be caused by the 5.2 regression
>>
>> https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u
>>
>> Fixed since 5.2.15 and 5.3.0.
> 
> Yikes, so my decision to update the latest kernel two weeks ago
> perhaps was a very bad one.  Should've stuck with 4.19.y I guess.
> 
>> So if you're willing to blow shit up again, you can try to reproduce
>> with one of those.
> 
> Well I could try but it sounds like this might be hard to reproduce...
> 
>> I was also doing oomkiller blow shit up tests a few weeks ago with
>> these same problem kernels and never hit this bug, or any others. I
>> also had to do a LOT of force power offs because the system just
>> became totally wedged in and I had no way of estimating how long it
>> would be for recovery so after 30 minutes I hit the power button. Many
>> times. Zero corruptions. That's with a single Samsung 840 EVO in a
>> laptop relegated to such testing.
> 
> Just a thought... the system was alive but I was able to briefly
> inspect the situation and notice that tasks were blocked and
> unkillable... until my shell hung too and then I was hosed.  But I
> didn't hit the power button but rather rebooted with sysrq+e, sysrq+u,
> sysrq+b.  Not sure if that makes a difference.
Not sure if this mattered, but as a general rule, unless you're dealing 
with an issue with the disk, you should always issue sysrq+s and wait a 
few seconds (or until the message that all filesystems have been synced 
shows up if you're on the console and can see kernel messages) before 
issuing a sysrq+u.  Remounting all filesystems read-only through sysrq+u 
does not reliably flush caches before forcing everything read-only.
> 
>> Might be a different bug. Not sure. But also, this is with
>>
>>> [  347.551595] CPU: 3 PID: 1143 Comm: mount Not tainted 4.19.34-1-lts #1
>>
>> So I don't know how an older kernel will report on the problem caused
>> by the 5.2 bug.
> 
> This is the kernel from systemrescuecd.  I can try taking a disk image
> and mounting on another machine with a newer linux version.
> 
> Thanks,
>    Nick
> 


      parent reply	other threads:[~2019-09-26 11:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-24 22:03 Btrfs filesystem trashed after OOM scenario Nick Bowler
2019-09-24 22:34 ` Chris Murphy
2019-09-25  4:25   ` Nick Bowler
2019-09-25  5:55     ` Chris Murphy
2019-09-26 11:26     ` Austin S. Hemmelgarn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c94f84b-a3ed-7f67-0178-e531895c5128@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=nbowler@draconx.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.