All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs kernel oops on mount
@ 2016-09-09 16:12 moparisthebest
  2016-09-09 17:51 ` Chris Murphy
  2016-09-09 18:47 ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 17+ messages in thread
From: moparisthebest @ 2016-09-09 16:12 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I'm hoping to get some help with mounting my btrfs array which quit
working yesterday.  My array was in the middle of a balance, about 50%
remaining, when it hit an error and remounted itself read-only [1].
btrfs fi show output [2], btrfs df output [3].

I unmounted the array, and when I tried to mount it again, it locked up
the whole system so even alt+sysrq would not work.  I rebooted, tried to
mount again, same lockup.  This was all kernel 4.5.7.

I rebooted to kernel 4.4.0, tried to mount, crashed again, this time a
message appeared on the screen and I took a picture [4].

I rebooted into an arch live system with kernel 4.7.2, tried to mount
again, got some dmesg output before it crashed [5] and took a picture
when it crashed [6], says in part 'BUG: unable to handle kernel NULL
pointer dereference at 00000000000001f0'.

Is there anything I can do to get this in a working state again or
perhaps even recover some data?

Thanks much for any help

[1]: https://www.moparisthebest.com/btrfs/initial_crash.txt
[2]: https://www.moparisthebest.com/btrfs/btrfsfishow.txt
[3]: https://www.moparisthebest.com/btrfs/btrfsdf.txt
[4]: https://www.moparisthebest.com/btrfsoops.jpg
[5]: https://www.moparisthebest.com/btrfs/dmsgprecrash.txt
[6]: https://www.moparisthebest.com/btrfsnulldereference.jpg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 16:12 btrfs kernel oops on mount moparisthebest
@ 2016-09-09 17:51 ` Chris Murphy
  2016-09-09 18:32   ` moparisthebest
  2016-09-10 15:13   ` moparisthebest
  2016-09-09 18:47 ` Austin S. Hemmelgarn
  1 sibling, 2 replies; 17+ messages in thread
From: Chris Murphy @ 2016-09-09 17:51 UTC (permalink / raw)
  To: moparisthebest; +Cc: Btrfs BTRFS

On Fri, Sep 9, 2016 at 10:12 AM, moparisthebest
<admin@moparisthebest.com> wrote:
> Hi,
>
> I'm hoping to get some help with mounting my btrfs array which quit
> working yesterday.  My array was in the middle of a balance, about 50%
> remaining, when it hit an error and remounted itself read-only [1].
> btrfs fi show output [2], btrfs df output [3].
>
> I unmounted the array, and when I tried to mount it again, it locked up
> the whole system so even alt+sysrq would not work.  I rebooted, tried to
> mount again, same lockup.  This was all kernel 4.5.7.
>
> I rebooted to kernel 4.4.0, tried to mount, crashed again, this time a
> message appeared on the screen and I took a picture [4].
>
> I rebooted into an arch live system with kernel 4.7.2, tried to mount
> again, got some dmesg output before it crashed [5] and took a picture
> when it crashed [6], says in part 'BUG: unable to handle kernel NULL
> pointer dereference at 00000000000001f0'.
>
> Is there anything I can do to get this in a working state again or
> perhaps even recover some data?
>
> Thanks much for any help
>
> [1]: https://www.moparisthebest.com/btrfs/initial_crash.txt
> [2]: https://www.moparisthebest.com/btrfs/btrfsfishow.txt
> [3]: https://www.moparisthebest.com/btrfs/btrfsdf.txt
> [4]: https://www.moparisthebest.com/btrfsoops.jpg
> [5]: https://www.moparisthebest.com/btrfs/dmsgprecrash.txt
> [6]: https://www.moparisthebest.com/btrfsnulldereference.jpg

Good report. Try on the 4.7.2 kernel system, two consoles, have one
ready with 'echo w > /proc/sysrq-trigger' as root (sudo doesn't work)
but don't issue it, mount in the other console and then switch back
and issue the sysrq. It'll take a while, minutes maybe even to switch
consoles, and then also for the command itself to issue, and then
minutes before the result actually gets committed to systemd journal
or var/log/messages. If it's a systemd system, and if you have to
force reboot to regain control, you can get the sysrq with 'journalctl
-b-1 -k > outputfile.txt'

Also btrfs check output is useful to include also (without --repair
for starters).

The thing that concerns me is this occasional problem that comes up
sometimes with lzo compressed volumes. Duncan knows more about that
one so he may chime in. I would definitely only do default mounts for
the above, don't include the compression option. You could also try -o
ro,recovery and see where that gets you.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 17:51 ` Chris Murphy
@ 2016-09-09 18:32   ` moparisthebest
  2016-09-09 18:49     ` Austin S. Hemmelgarn
  2016-09-09 19:21     ` Chris Murphy
  2016-09-10 15:13   ` moparisthebest
  1 sibling, 2 replies; 17+ messages in thread
From: moparisthebest @ 2016-09-09 18:32 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On 09/09/2016 01:51 PM, Chris Murphy wrote:
> On Fri, Sep 9, 2016 at 10:12 AM, moparisthebest
> <admin@moparisthebest.com> wrote:
>> Hi,
>>
>> I'm hoping to get some help with mounting my btrfs array which quit
>> working yesterday.  My array was in the middle of a balance, about 50%
>> remaining, when it hit an error and remounted itself read-only [1].
>> btrfs fi show output [2], btrfs df output [3].
>>
>> I unmounted the array, and when I tried to mount it again, it locked up
>> the whole system so even alt+sysrq would not work.  I rebooted, tried to
>> mount again, same lockup.  This was all kernel 4.5.7.
>>
>> I rebooted to kernel 4.4.0, tried to mount, crashed again, this time a
>> message appeared on the screen and I took a picture [4].
>>
>> I rebooted into an arch live system with kernel 4.7.2, tried to mount
>> again, got some dmesg output before it crashed [5] and took a picture
>> when it crashed [6], says in part 'BUG: unable to handle kernel NULL
>> pointer dereference at 00000000000001f0'.
>>
>> Is there anything I can do to get this in a working state again or
>> perhaps even recover some data?
>>
>> Thanks much for any help
>>
>> [1]: https://www.moparisthebest.com/btrfs/initial_crash.txt
>> [2]: https://www.moparisthebest.com/btrfs/btrfsfishow.txt
>> [3]: https://www.moparisthebest.com/btrfs/btrfsdf.txt
>> [4]: https://www.moparisthebest.com/btrfsoops.jpg
>> [5]: https://www.moparisthebest.com/btrfs/dmsgprecrash.txt
>> [6]: https://www.moparisthebest.com/btrfsnulldereference.jpg
> 
> Good report. Try on the 4.7.2 kernel system, two consoles, have one
> ready with 'echo w > /proc/sysrq-trigger' as root (sudo doesn't work)
> but don't issue it, mount in the other console and then switch back
> and issue the sysrq. It'll take a while, minutes maybe even to switch
> consoles, and then also for the command itself to issue, and then
> minutes before the result actually gets committed to systemd journal
> or var/log/messages. If it's a systemd system, and if you have to
> force reboot to regain control, you can get the sysrq with 'journalctl
> -b-1 -k > outputfile.txt'
> 
> Also btrfs check output is useful to include also (without --repair
> for starters).
> 
> The thing that concerns me is this occasional problem that comes up
> sometimes with lzo compressed volumes. Duncan knows more about that
> one so he may chime in. I would definitely only do default mounts for
> the above, don't include the compression option. You could also try -o
> ro,recovery and see where that gets you.
> 
> 

This is indeed an lzo compressed system, it's always been mounted with
that option anyhow.

btrfs check has been running for ~6 hours so far, I'll follow up with
output on that when it finishes.

Hmm, the problem with the 4.7.2/systemd system is it's a live usb system
so the log/journal wouldn't be saved anywhere except tmpfs, I'll see
what I can rig up unless someone has any amazing ideas?  I'm still brand
new to systemd...

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 16:12 btrfs kernel oops on mount moparisthebest
  2016-09-09 17:51 ` Chris Murphy
@ 2016-09-09 18:47 ` Austin S. Hemmelgarn
  2016-09-09 19:23   ` moparisthebest
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-09 18:47 UTC (permalink / raw)
  To: moparisthebest, linux-btrfs

On 2016-09-09 12:12, moparisthebest wrote:
> Hi,
>
> I'm hoping to get some help with mounting my btrfs array which quit
> working yesterday.  My array was in the middle of a balance, about 50%
> remaining, when it hit an error and remounted itself read-only [1].
> btrfs fi show output [2], btrfs df output [3].
>
> I unmounted the array, and when I tried to mount it again, it locked up
> the whole system so even alt+sysrq would not work.  I rebooted, tried to
> mount again, same lockup.  This was all kernel 4.5.7.
>
> I rebooted to kernel 4.4.0, tried to mount, crashed again, this time a
> message appeared on the screen and I took a picture [4].
>
> I rebooted into an arch live system with kernel 4.7.2, tried to mount
> again, got some dmesg output before it crashed [5] and took a picture
> when it crashed [6], says in part 'BUG: unable to handle kernel NULL
> pointer dereference at 00000000000001f0'.
>
> Is there anything I can do to get this in a working state again or
> perhaps even recover some data?
>
> Thanks much for any help
>
> [1]: https://www.moparisthebest.com/btrfs/initial_crash.txt
> [2]: https://www.moparisthebest.com/btrfs/btrfsfishow.txt
> [3]: https://www.moparisthebest.com/btrfs/btrfsdf.txt
> [4]: https://www.moparisthebest.com/btrfsoops.jpg
> [5]: https://www.moparisthebest.com/btrfs/dmsgprecrash.txt
> [6]: https://www.moparisthebest.com/btrfsnulldereference.jpg

The output from btrfs fi show and fi df both indicate that the 
filesystem is essentially completely full.  You've gotten to the point 
where your using the global metadata reserve, and I think things are 
getting stuck trying (and failing) to reclaim the space that's used 
there.  The fact that the kernel is crashing in response to this is 
concerning, but it isn't surprising as this is not something that's 
really all that tested, and is very much not a normal operational 
scenario.  I'm guessing that the error you hit that forced the 
filesystem read-only is something that requires recovery, which in turn 
requires copy-on-write updates of some of the metadata, which you have 
essentially zero room for, and that's what's causing the kernel to choke 
when trying to mount the filesystem.

Given that the FS is pretty much wedged, I think your best bet for 
fixing this is probably going to be to use btrfs restore to get the data 
onto a new (larger) set of disks.  If you do take this approach, a 
metadata dump might be useful, if somebody could find enough room to 
extract it.

Alternatively, because of the small amount of free space on the largest 
device in the array, you _might_ be able to fix things if you can get it 
mounted read-write by running a balance converting both data and 
metadata to single profiles, adding a few more disks (or replacing some 
with bigger ones), and then converting back to raid1 profiles.  This is 
exponentially more risky than just restoring to a new filesystem, and 
will almost certainly take longer.

A couple of other things to comment about on this:
1. 'can_overcommit' (the function that the Arch kernel choked on) is 
from the memory management subsystem.  The fact that that's throwing a 
null pointer says to me either your hardware has issues, or the Arch 
kernel itself has problems (which would probably mean the kernel image 
is corrupted).
2. You may want to look for more symmetrically sized disks if you're 
going to be using raid1 mode.  The space that's free on the last listed 
disk in the filesystem is unusable in raid1 mode because there are no 
other disks with usable space.
3. In general, it's a good idea to keep an eye on space usage on your 
filesystems.  If it's getting to be more than about 95% full, you should 
be looking at getting some more storage space.  This is especially true 
for BTRFS, as a 100% full BTRFS filesystem functionally becomes 
permanently read-only because there's nowhere for the copy-on-write 
updates to write to.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 18:32   ` moparisthebest
@ 2016-09-09 18:49     ` Austin S. Hemmelgarn
  2016-09-09 19:21     ` Chris Murphy
  1 sibling, 0 replies; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-09 18:49 UTC (permalink / raw)
  To: moparisthebest, Chris Murphy; +Cc: Btrfs BTRFS

On 2016-09-09 14:32, moparisthebest wrote:
> On 09/09/2016 01:51 PM, Chris Murphy wrote:
>> On Fri, Sep 9, 2016 at 10:12 AM, moparisthebest
>> <admin@moparisthebest.com> wrote:
>>> Hi,
>>>
>>> I'm hoping to get some help with mounting my btrfs array which quit
>>> working yesterday.  My array was in the middle of a balance, about 50%
>>> remaining, when it hit an error and remounted itself read-only [1].
>>> btrfs fi show output [2], btrfs df output [3].
>>>
>>> I unmounted the array, and when I tried to mount it again, it locked up
>>> the whole system so even alt+sysrq would not work.  I rebooted, tried to
>>> mount again, same lockup.  This was all kernel 4.5.7.
>>>
>>> I rebooted to kernel 4.4.0, tried to mount, crashed again, this time a
>>> message appeared on the screen and I took a picture [4].
>>>
>>> I rebooted into an arch live system with kernel 4.7.2, tried to mount
>>> again, got some dmesg output before it crashed [5] and took a picture
>>> when it crashed [6], says in part 'BUG: unable to handle kernel NULL
>>> pointer dereference at 00000000000001f0'.
>>>
>>> Is there anything I can do to get this in a working state again or
>>> perhaps even recover some data?
>>>
>>> Thanks much for any help
>>>
>>> [1]: https://www.moparisthebest.com/btrfs/initial_crash.txt
>>> [2]: https://www.moparisthebest.com/btrfs/btrfsfishow.txt
>>> [3]: https://www.moparisthebest.com/btrfs/btrfsdf.txt
>>> [4]: https://www.moparisthebest.com/btrfsoops.jpg
>>> [5]: https://www.moparisthebest.com/btrfs/dmsgprecrash.txt
>>> [6]: https://www.moparisthebest.com/btrfsnulldereference.jpg
>>
>> Good report. Try on the 4.7.2 kernel system, two consoles, have one
>> ready with 'echo w > /proc/sysrq-trigger' as root (sudo doesn't work)
>> but don't issue it, mount in the other console and then switch back
>> and issue the sysrq. It'll take a while, minutes maybe even to switch
>> consoles, and then also for the command itself to issue, and then
>> minutes before the result actually gets committed to systemd journal
>> or var/log/messages. If it's a systemd system, and if you have to
>> force reboot to regain control, you can get the sysrq with 'journalctl
>> -b-1 -k > outputfile.txt'
>>
>> Also btrfs check output is useful to include also (without --repair
>> for starters).
>>
>> The thing that concerns me is this occasional problem that comes up
>> sometimes with lzo compressed volumes. Duncan knows more about that
>> one so he may chime in. I would definitely only do default mounts for
>> the above, don't include the compression option. You could also try -o
>> ro,recovery and see where that gets you.
>>
>>
>
> This is indeed an lzo compressed system, it's always been mounted with
> that option anyhow.
>
> btrfs check has been running for ~6 hours so far, I'll follow up with
> output on that when it finishes.
>
> Hmm, the problem with the 4.7.2/systemd system is it's a live usb system
> so the log/journal wouldn't be saved anywhere except tmpfs, I'll see
> what I can rig up unless someone has any amazing ideas?  I'm still brand
> new to systemd...
I don't know much about systemd myself, but I do know it's possible to 
set up a remote journal (essentially a remote logging server like people 
have been doing for decades with syslogd).  I don't know if this would 
catch the error or not though.  Alternatively, if you could set up a 
serial console, you could capture all the output there instead without 
even having to touch the journal.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 18:32   ` moparisthebest
  2016-09-09 18:49     ` Austin S. Hemmelgarn
@ 2016-09-09 19:21     ` Chris Murphy
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2016-09-09 19:21 UTC (permalink / raw)
  To: moparisthebest; +Cc: Chris Murphy, Btrfs BTRFS

On Fri, Sep 9, 2016 at 12:32 PM, moparisthebest
<admin@moparisthebest.com> wrote:

> This is indeed an lzo compressed system, it's always been mounted with
> that option anyhow.
>
> btrfs check has been running for ~6 hours so far, I'll follow up with
> output on that when it finishes.
>
> Hmm, the problem with the 4.7.2/systemd system is it's a live usb system
> so the log/journal wouldn't be saved anywhere except tmpfs, I'll see
> what I can rig up unless someone has any amazing ideas?  I'm still brand
> new to systemd...

Pick the easier of:
1.
ssh with a remote computer; the blocked tasks will slow down sshd and
the responsiveness of everything; but it shouldn't totally inhibit it
and may be more reliable than a local VT if the command is pretyped
and ready to go before you initiate the mount. Use journalctl -fk to
follow, and save out the output as  text file from that remote
computer.
2.
netconsole might be more reliable than sshd in this case, again just
connect with a remote computer, and in its Terminal you can do:
journalctl -fk
3.
Create a file system on a USB stick partition, copy live's /var to the
stick, then mount the stick over the live's /var, and now it's read
writeable. And then:
           mkdir -p /var/log/journal
           systemd-tmpfiles --create --prefix /var/log/journal

I think that will cause systemd-journald to flush to /var now, you can do:
journalctl -b | grep journald

And see if you have lines like this:
Sep 09 09:11:05 f24m systemd-journald[238]: Journal stopped
Sep 09 09:11:06 f24m systemd-journald[549]: Runtime journal
(/run/log/journal/) is 8.0M, max 393.2M, 385.2M free.
Sep 09 09:11:06 f24m systemd-journald[549]: System journal
(/var/log/journal/) is 999.7M, max 1.0G, 24.2M free.
Sep 09 09:11:07 f24m systemd-journald[549]: Time spent on flushing to
/var is 1.040757s for 1490 entries.
Sep 09 09:11:07 f24m systemd-journald[238]: Received SIGTERM from PID
1 (systemd).

So what happens when you force reboot? Mount this stick, and use
'journalctl -D /mnt/log/journal/machineid/ > outputfile.txt' which
will point to the journal binary file and write it out to a text file.
You could try -k to filter out just kernel messages but since that
implies -b and you have a different boot than what's in this journal I
have no idea off hand if that will work;  you could also filter by |
grep kernel > outputfile.txt but maybe not every line will have kernel
in it? I just tried it  with sysrq t and everything relevant seems to
have "kernel" in each line.


They're probably in order of ease; but not sure which is more reliable
when things are being blocked. Network may be more or less blocked
*shrug* I'd use XFS for the stick file system for /var.


Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 18:47 ` Austin S. Hemmelgarn
@ 2016-09-09 19:23   ` moparisthebest
  2016-09-09 22:09     ` Duncan
  2016-09-12 11:37     ` Austin S. Hemmelgarn
  2016-09-09 19:28   ` Chris Murphy
  2016-09-12 12:33   ` Jeff Mahoney
  2 siblings, 2 replies; 17+ messages in thread
From: moparisthebest @ 2016-09-09 19:23 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, linux-btrfs

On 09/09/2016 02:47 PM, Austin S. Hemmelgarn wrote:
> On 2016-09-09 12:12, moparisthebest wrote:
>> Hi,
>>
>> I'm hoping to get some help with mounting my btrfs array which quit
>> working yesterday.  My array was in the middle of a balance, about 50%
>> remaining, when it hit an error and remounted itself read-only [1].
>> btrfs fi show output [2], btrfs df output [3].
>>
>> I unmounted the array, and when I tried to mount it again, it locked up
>> the whole system so even alt+sysrq would not work.  I rebooted, tried to
>> mount again, same lockup.  This was all kernel 4.5.7.
>>
>> I rebooted to kernel 4.4.0, tried to mount, crashed again, this time a
>> message appeared on the screen and I took a picture [4].
>>
>> I rebooted into an arch live system with kernel 4.7.2, tried to mount
>> again, got some dmesg output before it crashed [5] and took a picture
>> when it crashed [6], says in part 'BUG: unable to handle kernel NULL
>> pointer dereference at 00000000000001f0'.
>>
>> Is there anything I can do to get this in a working state again or
>> perhaps even recover some data?
>>
>> Thanks much for any help
>>
>> [1]: https://www.moparisthebest.com/btrfs/initial_crash.txt
>> [2]: https://www.moparisthebest.com/btrfs/btrfsfishow.txt
>> [3]: https://www.moparisthebest.com/btrfs/btrfsdf.txt
>> [4]: https://www.moparisthebest.com/btrfsoops.jpg
>> [5]: https://www.moparisthebest.com/btrfs/dmsgprecrash.txt
>> [6]: https://www.moparisthebest.com/btrfsnulldereference.jpg
> 
> The output from btrfs fi show and fi df both indicate that the
> filesystem is essentially completely full.  You've gotten to the point
> where your using the global metadata reserve, and I think things are
> getting stuck trying (and failing) to reclaim the space that's used
> there.  The fact that the kernel is crashing in response to this is
> concerning, but it isn't surprising as this is not something that's
> really all that tested, and is very much not a normal operational
> scenario.  I'm guessing that the error you hit that forced the
> filesystem read-only is something that requires recovery, which in turn
> requires copy-on-write updates of some of the metadata, which you have
> essentially zero room for, and that's what's causing the kernel to choke
> when trying to mount the filesystem.
> 
> Given that the FS is pretty much wedged, I think your best bet for
> fixing this is probably going to be to use btrfs restore to get the data
> onto a new (larger) set of disks.  If you do take this approach, a
> metadata dump might be useful, if somebody could find enough room to
> extract it.
> 
> Alternatively, because of the small amount of free space on the largest
> device in the array, you _might_ be able to fix things if you can get it
> mounted read-write by running a balance converting both data and
> metadata to single profiles, adding a few more disks (or replacing some
> with bigger ones), and then converting back to raid1 profiles.  This is
> exponentially more risky than just restoring to a new filesystem, and
> will almost certainly take longer.
> 
> A couple of other things to comment about on this:
> 1. 'can_overcommit' (the function that the Arch kernel choked on) is
> from the memory management subsystem.  The fact that that's throwing a
> null pointer says to me either your hardware has issues, or the Arch
> kernel itself has problems (which would probably mean the kernel image
> is corrupted).
> 2. You may want to look for more symmetrically sized disks if you're
> going to be using raid1 mode.  The space that's free on the last listed
> disk in the filesystem is unusable in raid1 mode because there are no
> other disks with usable space.
> 3. In general, it's a good idea to keep an eye on space usage on your
> filesystems.  If it's getting to be more than about 95% full, you should
> be looking at getting some more storage space.  This is especially true
> for BTRFS, as a 100% full BTRFS filesystem functionally becomes
> permanently read-only because there's nowhere for the copy-on-write
> updates to write to.

If I read btrfs fi show right, it's got minimum ~600gb free on each one
of the 8 drives, shouldn't that be more than enough for most things?  (I
guess unless I have single files over 600gb that need COW'd, I don't though)

Didn't ubuntu on kernel 4.4 die in the same can_overcommit function?
(https://www.moparisthebest.com/btrfsoops.jpg) what kind of hardware
issues would cause a repeatable kernel crash like that?  Like am I
looking at memory issues or the SAS controller or what?

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 18:47 ` Austin S. Hemmelgarn
  2016-09-09 19:23   ` moparisthebest
@ 2016-09-09 19:28   ` Chris Murphy
  2016-09-10 18:50     ` moparisthebest
  2016-09-12 12:33   ` Jeff Mahoney
  2 siblings, 1 reply; 17+ messages in thread
From: Chris Murphy @ 2016-09-09 19:28 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: moparisthebest, Btrfs BTRFS

On Fri, Sep 9, 2016 at 12:47 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
>
> The output from btrfs fi show and fi df both indicate that the filesystem is
> essentially completely full.

?What am I missing?

https://www.moparisthebest.com/btrfs/btrfsfishow.txt

There's thousands of GiB's totally unallocated. Just taking the last
two devices:

devid   13 size 3.64TiB used 3.04TiB path /dev/mapper/fourtb5
devid   14 size 7.28TiB used 6.21TiB path /dev/mapper/eighttb

There's plenty of room for it to allocate some 600GiB of new metadata
or data chunks, mirrored on just these two devices. None of the others
is totally full either.

Sounds like with enospc devs want to see a couple things beyond what I
asked for:

enospc_debug
grep -IR . /sys/fs/btrfs/UUID/allocation/

That's kinda hard to do right now if it's not mounting though...



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 19:23   ` moparisthebest
@ 2016-09-09 22:09     ` Duncan
  2016-09-12 11:37     ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 17+ messages in thread
From: Duncan @ 2016-09-09 22:09 UTC (permalink / raw)
  To: linux-btrfs

moparisthebest posted on Fri, 09 Sep 2016 15:23:13 -0400 as excerpted:

> On 09/09/2016 02:47 PM, Austin S. Hemmelgarn wrote:
>> On 2016-09-09 12:12, moparisthebest wrote:
>>> Hi,
>>>
>>> I'm hoping to get some help with mounting my btrfs array which quit
>>> working yesterday.  My array was in the middle of a balance, about 50%
>>> remaining, when it hit an error and remounted itself read-only [1].
>>> btrfs fi show output [2], btrfs df output [3].
>>>
>>> I unmounted the array, and when I tried to mount it again, it locked
>>> up the whole system so even alt+sysrq would not work.  I rebooted,
>>> tried to mount again, same lockup.  This was all kernel 4.5.7.
>>>
>>> I rebooted to kernel 4.4.0, tried to mount, crashed again, this time a
>>> message appeared on the screen and I took a picture [4].
>>>
>>> I rebooted into an arch live system with kernel 4.7.2, tried to mount
>>> again, got some dmesg output before it crashed [5] and took a picture
>>> when it crashed [6], says in part 'BUG: unable to handle kernel NULL
>>> pointer dereference at 00000000000001f0'.
>>>
>>> Is there anything I can do to get this in a working state again or
>>> perhaps even recover some data?
>>>
>>> Thanks much for any help
>>>
>>> [1]: https://www.moparisthebest.com/btrfs/initial_crash.txt [2]:
>>> https://www.moparisthebest.com/btrfs/btrfsfishow.txt [3]:
>>> https://www.moparisthebest.com/btrfs/btrfsdf.txt [4]:
>>> https://www.moparisthebest.com/btrfsoops.jpg [5]:
>>> https://www.moparisthebest.com/btrfs/dmsgprecrash.txt [6]:
>>> https://www.moparisthebest.com/btrfsnulldereference.jpg
>> 
>> The output from btrfs fi show and fi df both indicate that the
>> filesystem is essentially completely full.  You've gotten to the point
>> where your using the global metadata reserve, and I think things are
>> getting stuck trying (and failing) to reclaim the space that's used
>> there.

>> Given that the FS is pretty much wedged, I think your best bet for
>> fixing this is probably going to be to use btrfs restore to get the
>> data onto a new (larger) set of disks.  If you do take this approach, a
>> metadata dump might be useful, if somebody could find enough room to
>> extract it.

> If I read btrfs fi show right, it's got minimum ~600gb free on each one
> of the 8 drives, shouldn't that be more than enough for most things?  (I
> guess unless I have single files over 600gb that need COW'd, I don't
> though)

Austin did pick up on something I (and apparently Chris) missed, the non-
zero used global reserve, but as best I can tell he's wrongly attributing 
it to fully used devices, when as you (and Chris) point out that's not 
the case.

What he picked up on is this.  Under normal conditions, global reserve 
"used" should always be zero, as sans bugs, btrfs has to be in pretty 
dire lack of space condition before it'll start using the reserve.  Under 
most conditions, btrfs will simply ENOSPC an operation before it starts 
using reserve, so the fact that it's used indicates that btrfs *BELIEVES* 
that it is in dire straits, space-wise, and has no place to go *but* 
reserves.

But as you point out, all eight devices seem to have a half-TiB plus 
available, unallocated and free to allocate as necessary.  Given that 
btrfs raid1 only does pair-mirroring, and that chunks should be at 
absolute largest, 10 GiB, there's *plenty* of space to allocate as needed.


Which can only mean that you've hit one of those elusive ENOSPC bugs 
where there's plenty of space left to allocate, but btrfs simply refuses 
to allocate it, instead triggering ENOSPC errors left and right, and of 
particular interest here, btrfs believes the ENOSPC problems to be severe 
enough that it has even run substantially into global reserves, *DESPITE* 
there *actually* being *plenty* of space!

Now I'm not a dev (just a btrfs user and list regular) and the traces, 
etc, don't tend to add much usable information for me, so I can't judge 
whether your particular case is affected by the following or not, but as 
it so happens, there's active patches going into 4.8 dealing with some of 
these previously unsolved ENOSPC when there's *plenty* of space bugs.

So there's a fair chance the patches in either current 4.8-git or still 
in-process at this very moment will fix at least the evident false ENOSPC 
despite loads of space actually being available, which based on the fact 
that used reserve is /not/ zero was very likely the original trigger for 
the auto-remount-ro.  However, it's also possible that there are other 
issues now as well, that the current patches may /not/ fix, even if they 
fix all the ENOSPC issues, which itself I can't guarantee.  But it's 
worth a shot.

The other known problem with a known (mount-option) fix that you're 
almost certainly running into ATM is the unfinished balance, since the 
balance will try to resume once you mount the btrfs writable, and at 
least without the ENSPC patchs mentioned above, that balance is 
immediately running into the same ENSPC problem that triggered the 
remount-ro in the first place.

So try adding skip_balance to your mount options, and see if that lets 
you mount without the crash.  If it does, you can then manually run btrfs 
balance cancel to cancel the ongoing balance, allowing you to mount 
normally (without skip_balance) again.  However, you might want to try 
the ENOSPC patches first, before canceling the balance, since the cancel 
by definition will lose your place in the balance, and presumably you 
were doing a balance for some reason and would thus have to restart it.


So what I'd try, in order:

0) Btrfs is still considered stabilizing, not yet fully stable and 
mature, so the usual sysadmin's rule of backups, that you either have 
them or by virtue of skipping them, you're defining your data as of less 
value to you than the hassle and resources a backup would otherwise 
require, regardless of any claims to the contrary, applies even more 
strongly than it does to a normally stable and mature filesystem.

So if you don't have backups (or they're outdated) and you are now 
reconsidering your definition of that data as not worth the hassle of 
backups, your first priority is getting those backups, even before repair 
of the filesystem.

If that is your case, I'd try mounting read-only and taking the backup 
from there if you can, or using btrfs restore if you can't mount read-
only.

Then of course be aware of what a failure to have backups actually means 
in terms of how you are defining the value of your data (or the value of 
the delta between the current data and the data at the time of the last 
backup, if you have them but they aren't absolutely current), and act 
accordingly.  If that means btrfs is no longer an appropriate choice for 
you due to the stronger backups rule application, that's what it means.

1) A quick mount with skip_balance using your existing kernel, just to 
see if it lets you mount without an immediate crash.  If it does, we know 
it was the resuming balance that was the problem.  But don't cancel the 
balance just yet so you don't lose your place in it.

Of course if the option works, this is a nice place to take/update 
backups too. =:^)

2) A mount with the very latest 4.8-rc or git kernel, possibly with 
further enospc patches applied (I'm not sure if they've all reached 
mainline yet).  If you're really lucky, these enospc patches will let you 
continue the existing in-process balance from where you left off, thus 
avoiding the cancel.

If you're less lucky but still in good shape, they'll fix the root 
problem but the balance already got the btrfs so wedged that you'll still 
have to mount with skip_balance, then cancel the balance, losing your 
place, and then presumably restart a new one.

3) Given the currently active enospc work, find the threads discussing 
those patches and either confirm that they fixed your enospc problem, or 
catch up on the status of the current patches and what sorts of debugging 
and testing the devs are having reporters do, and either confirm a 
remaining issue on those threads or get prepared to do a new bug, if the 
issue appears to yet another enospc bug, that isn't addressed by those 
patches.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 17:51 ` Chris Murphy
  2016-09-09 18:32   ` moparisthebest
@ 2016-09-10 15:13   ` moparisthebest
  1 sibling, 0 replies; 17+ messages in thread
From: moparisthebest @ 2016-09-10 15:13 UTC (permalink / raw)
  To: Btrfs BTRFS

On 09/09/2016 01:51 PM, Chris Murphy wrote:
> Also btrfs check output is useful to include also (without --repair
> for starters).

btrfs check --readonly output is here:

https://www.moparisthebest.com/btrfs/btrfscheck.txt

*Most* of it anyway, I messed up with tmux and it took 20 hours to run
so I don't really want to run it again unless you need me to.  Now that
check is over I'll try the other things suggested.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 19:28   ` Chris Murphy
@ 2016-09-10 18:50     ` moparisthebest
  0 siblings, 0 replies; 17+ messages in thread
From: moparisthebest @ 2016-09-10 18:50 UTC (permalink / raw)
  To: Btrfs BTRFS

On 09/09/2016 03:28 PM, Chris Murphy wrote:
> Sounds like with enospc devs want to see a couple things beyond what I
> asked for:
> 
> enospc_debug
> grep -IR . /sys/fs/btrfs/UUID/allocation/
> 
> That's kinda hard to do right now if it's not mounting though...

I managed to get more output from arch/4.7.2 using netconsole, I did end
up with duplicate lines somehow though which uniq fixed, but some of the
crash is mixed together on the same line, I didn't mess with that for
fear of taking out something important:

https://www.moparisthebest.com/btrfs/archnetconsole.txt

I was also able to mount it ro so I ran the grep you asked for:

https://www.moparisthebest.com/btrfs/enospcdebug.txt

I tried mounting with mount -o rw,skip_balance and it still locked up,
so for now it's read-only...

Let me know what else I can provide or try.  I haven't been able to boot
with a 4.7 kernel on my ubuntu install so I figure 4.8 will be the same,
I guess I'll need to permanently install something like arch to a flash
drive and try 4.8 from there.

Thanks!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 19:23   ` moparisthebest
  2016-09-09 22:09     ` Duncan
@ 2016-09-12 11:37     ` Austin S. Hemmelgarn
  2016-09-12 13:32       ` moparisthebest
  1 sibling, 1 reply; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-12 11:37 UTC (permalink / raw)
  To: moparisthebest, linux-btrfs

On 2016-09-09 15:23, moparisthebest wrote:
> On 09/09/2016 02:47 PM, Austin S. Hemmelgarn wrote:
>> On 2016-09-09 12:12, moparisthebest wrote:
>>> Hi,
>>>
>>> I'm hoping to get some help with mounting my btrfs array which quit
>>> working yesterday.  My array was in the middle of a balance, about 50%
>>> remaining, when it hit an error and remounted itself read-only [1].
>>> btrfs fi show output [2], btrfs df output [3].
>>>
>>> I unmounted the array, and when I tried to mount it again, it locked up
>>> the whole system so even alt+sysrq would not work.  I rebooted, tried to
>>> mount again, same lockup.  This was all kernel 4.5.7.
>>>
>>> I rebooted to kernel 4.4.0, tried to mount, crashed again, this time a
>>> message appeared on the screen and I took a picture [4].
>>>
>>> I rebooted into an arch live system with kernel 4.7.2, tried to mount
>>> again, got some dmesg output before it crashed [5] and took a picture
>>> when it crashed [6], says in part 'BUG: unable to handle kernel NULL
>>> pointer dereference at 00000000000001f0'.
>>>
>>> Is there anything I can do to get this in a working state again or
>>> perhaps even recover some data?
>>>
>>> Thanks much for any help
>>>
>>> [1]: https://www.moparisthebest.com/btrfs/initial_crash.txt
>>> [2]: https://www.moparisthebest.com/btrfs/btrfsfishow.txt
>>> [3]: https://www.moparisthebest.com/btrfs/btrfsdf.txt
>>> [4]: https://www.moparisthebest.com/btrfsoops.jpg
>>> [5]: https://www.moparisthebest.com/btrfs/dmsgprecrash.txt
>>> [6]: https://www.moparisthebest.com/btrfsnulldereference.jpg
>>
>> The output from btrfs fi show and fi df both indicate that the
>> filesystem is essentially completely full.  You've gotten to the point
>> where your using the global metadata reserve, and I think things are
>> getting stuck trying (and failing) to reclaim the space that's used
>> there.  The fact that the kernel is crashing in response to this is
>> concerning, but it isn't surprising as this is not something that's
>> really all that tested, and is very much not a normal operational
>> scenario.  I'm guessing that the error you hit that forced the
>> filesystem read-only is something that requires recovery, which in turn
>> requires copy-on-write updates of some of the metadata, which you have
>> essentially zero room for, and that's what's causing the kernel to choke
>> when trying to mount the filesystem.
>>
>> Given that the FS is pretty much wedged, I think your best bet for
>> fixing this is probably going to be to use btrfs restore to get the data
>> onto a new (larger) set of disks.  If you do take this approach, a
>> metadata dump might be useful, if somebody could find enough room to
>> extract it.
>>
>> Alternatively, because of the small amount of free space on the largest
>> device in the array, you _might_ be able to fix things if you can get it
>> mounted read-write by running a balance converting both data and
>> metadata to single profiles, adding a few more disks (or replacing some
>> with bigger ones), and then converting back to raid1 profiles.  This is
>> exponentially more risky than just restoring to a new filesystem, and
>> will almost certainly take longer.
>>
>> A couple of other things to comment about on this:
>> 1. 'can_overcommit' (the function that the Arch kernel choked on) is
>> from the memory management subsystem.  The fact that that's throwing a
>> null pointer says to me either your hardware has issues, or the Arch
>> kernel itself has problems (which would probably mean the kernel image
>> is corrupted).
>> 2. You may want to look for more symmetrically sized disks if you're
>> going to be using raid1 mode.  The space that's free on the last listed
>> disk in the filesystem is unusable in raid1 mode because there are no
>> other disks with usable space.
>> 3. In general, it's a good idea to keep an eye on space usage on your
>> filesystems.  If it's getting to be more than about 95% full, you should
>> be looking at getting some more storage space.  This is especially true
>> for BTRFS, as a 100% full BTRFS filesystem functionally becomes
>> permanently read-only because there's nowhere for the copy-on-write
>> updates to write to.
>
> If I read btrfs fi show right, it's got minimum ~600gb free on each one
> of the 8 drives, shouldn't that be more than enough for most things?  (I
> guess unless I have single files over 600gb that need COW'd, I don't though)
Ah, you're right, I misread all but the last line, sorry about the 
confusion!

That said, like Duncan mentioned, the fact the GlobalReserve is 
allocated combined with all that free space is not a good sign.  I think 
that balancing metadata may use it sometimes, but I'm not certain, and 
it definitely should not be showing anything allocated there in normal 
usage.
>
> Didn't ubuntu on kernel 4.4 die in the same can_overcommit function?
> (https://www.moparisthebest.com/btrfsoops.jpg) what kind of hardware
> issues would cause a repeatable kernel crash like that?  Like am I
> looking at memory issues or the SAS controller or what?
It doesn't look like it died in can_overcommit, as that's not anywhere 
on the stack trace.  The second item on the stack though 
(btrfs_async_reclaim_metadata_space) at least partly reinforces the 
suspicion that something is messed up in the filesystems metadata (which 
could explain the allocations in GlobalReserve, which is a subset of the 
Metadata chunks).  It looks like each crash was in a different place, 
but at least the first two could easily be different parts of the kernel 
choking on the same thing.  As far as the crash in can_overcommit, that 
combined with the apparent corrupted metadata makes me think there may 
be a hardware problem.  The first thing I'd check in that respect is the 
cabling to the drives themselves, followed by system RAM, the PSU, and 
the the storage controller.  I generally check in that order because 
it's trivial to check the cabling, and not all that difficult to check 
the RAM and PSU (and RAM is more likely to go bad than the PSU), and 
properly checking a storage controller is extremely dificult unless you 
have another known working one you can swap it for (and even then, it's 
only practical to check if you know the state on disk won't cause the 
kernel to choke).
>
> Thanks!
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-09 18:47 ` Austin S. Hemmelgarn
  2016-09-09 19:23   ` moparisthebest
  2016-09-09 19:28   ` Chris Murphy
@ 2016-09-12 12:33   ` Jeff Mahoney
  2016-09-12 12:54     ` Austin S. Hemmelgarn
  2 siblings, 1 reply; 17+ messages in thread
From: Jeff Mahoney @ 2016-09-12 12:33 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, moparisthebest, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1148 bytes --]

On 9/9/16 8:47 PM, Austin S. Hemmelgarn wrote:
> A couple of other things to comment about on this:
> 1. 'can_overcommit' (the function that the Arch kernel choked on) is
> from the memory management subsystem.  The fact that that's throwing a
> null pointer says to me either your hardware has issues, or the Arch
> kernel itself has problems (which would probably mean the kernel image
> is corrupted).

fs/btrfs/extent-tree.c:
static int can_overcommit(struct btrfs_root *root,
                          struct btrfs_space_info *space_info, u64 bytes,
                          enum btrfs_reserve_flush_enum flush)

> 3. In general, it's a good idea to keep an eye on space usage on your
> filesystems.  If it's getting to be more than about 95% full, you should
> be looking at getting some more storage space.  This is especially true
> for BTRFS, as a 100% full BTRFS filesystem functionally becomes
> permanently read-only because there's nowhere for the copy-on-write
> updates to write to.

The entire point of having the global metadata reserve is to avoid that
situation.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-12 12:33   ` Jeff Mahoney
@ 2016-09-12 12:54     ` Austin S. Hemmelgarn
  2016-09-12 13:27       ` Jeff Mahoney
  0 siblings, 1 reply; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-12 12:54 UTC (permalink / raw)
  To: Jeff Mahoney, moparisthebest, linux-btrfs

On 2016-09-12 08:33, Jeff Mahoney wrote:
> On 9/9/16 8:47 PM, Austin S. Hemmelgarn wrote:
>> A couple of other things to comment about on this:
>> 1. 'can_overcommit' (the function that the Arch kernel choked on) is
>> from the memory management subsystem.  The fact that that's throwing a
>> null pointer says to me either your hardware has issues, or the Arch
>> kernel itself has problems (which would probably mean the kernel image
>> is corrupted).
>
> fs/btrfs/extent-tree.c:
> static int can_overcommit(struct btrfs_root *root,
>                           struct btrfs_space_info *space_info, u64 bytes,
>                           enum btrfs_reserve_flush_enum flush)
>
OK, my bad there, but that begs the question: why does a BTRFS function 
not have a BTRFS prefix?  The name blatantly sounds like a mm function 
(and I could have sworn I can across one with an almost identical name 
when I was trying to understand the mm code a couple months ago), and 
the lack of a prefix combined with that heavily implies that it's a core 
kernel function.

Given this, it's almost certainly the balance choking on corrupted 
metadata that's causing the issue.
>> 3. In general, it's a good idea to keep an eye on space usage on your
>> filesystems.  If it's getting to be more than about 95% full, you should
>> be looking at getting some more storage space.  This is especially true
>> for BTRFS, as a 100% full BTRFS filesystem functionally becomes
>> permanently read-only because there's nowhere for the copy-on-write
>> updates to write to.
>
> The entire point of having the global metadata reserve is to avoid that
> situation.
Except that the global metadata reserve is usually only just barely big 
enough, and it only works for metadata.  While I get that this issue is 
what it's supposed to fix, it doesn't do so in a way that makes it easy 
to get out of that situation.  The reserve itself is often not big 
enough to do anything in any reasonable amount of time once the FS gets 
beyond about a hundred GB and yous tart talking about very large files.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-12 12:54     ` Austin S. Hemmelgarn
@ 2016-09-12 13:27       ` Jeff Mahoney
  2016-09-12 13:58         ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Mahoney @ 2016-09-12 13:27 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, moparisthebest, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3094 bytes --]

On 9/12/16 2:54 PM, Austin S. Hemmelgarn wrote:
> On 2016-09-12 08:33, Jeff Mahoney wrote:
>> On 9/9/16 8:47 PM, Austin S. Hemmelgarn wrote:
>>> A couple of other things to comment about on this:
>>> 1. 'can_overcommit' (the function that the Arch kernel choked on) is
>>> from the memory management subsystem.  The fact that that's throwing a
>>> null pointer says to me either your hardware has issues, or the Arch
>>> kernel itself has problems (which would probably mean the kernel image
>>> is corrupted).
>>
>> fs/btrfs/extent-tree.c:
>> static int can_overcommit(struct btrfs_root *root,
>>                           struct btrfs_space_info *space_info, u64 bytes,
>>                           enum btrfs_reserve_flush_enum flush)
>>
> OK, my bad there, but that begs the question: why does a BTRFS function
> not have a BTRFS prefix?  The name blatantly sounds like a mm function
> (and I could have sworn I can across one with an almost identical name
> when I was trying to understand the mm code a couple months ago), and
> the lack of a prefix combined with that heavily implies that it's a core
> kernel function.
> 
> Given this, it's almost certainly the balance choking on corrupted
> metadata that's causing the issue.

Because it's a static function and has a namespace limited to the
current C file.  If we prefixed every function in a local namespace with
the subsystem, the code would be unreadable.  At any rate, the full
symbol name in the Oops is:

can_overcommit+0x1e/0x110 [btrfs]

So we do identify the proper namespace in the Oops already.

>>> 3. In general, it's a good idea to keep an eye on space usage on your
>>> filesystems.  If it's getting to be more than about 95% full, you should
>>> be looking at getting some more storage space.  This is especially true
>>> for BTRFS, as a 100% full BTRFS filesystem functionally becomes
>>> permanently read-only because there's nowhere for the copy-on-write
>>> updates to write to.
>>
>> The entire point of having the global metadata reserve is to avoid that
>> situation.
> Except that the global metadata reserve is usually only just barely big
> enough, and it only works for metadata.  While I get that this issue is
> what it's supposed to fix, it doesn't do so in a way that makes it easy
> to get out of that situation.  The reserve itself is often not big
> enough to do anything in any reasonable amount of time once the FS gets
> beyond about a hundred GB and yous tart talking about very large files.

Why would it need to apply to data?  The reserve is used to meet the
reservation requirements to CoW metadata blocks needed to release the
data blocks.  The data blocks themselves aren't touched; they're only
released.  The size of the file really should only matter in terms of
how many extent items need to be released but it shouldn't matter at all
in terms of how many blocks the file's data occupies.  E.g. a 100 GB
file that uses a handful of extents would be essentially free in this
context.

-Jeff

-- 
Jeff Mahoney
SUSE Labs


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 881 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-12 11:37     ` Austin S. Hemmelgarn
@ 2016-09-12 13:32       ` moparisthebest
  0 siblings, 0 replies; 17+ messages in thread
From: moparisthebest @ 2016-09-12 13:32 UTC (permalink / raw)
  To: linux-btrfs

On 09/12/2016 07:37 AM, Austin S. Hemmelgarn wrote:
>> On 2016-09-09 15:23, moparisthebest wrote:
>> Didn't ubuntu on kernel 4.4 die in the same can_overcommit function?
>> (https://www.moparisthebest.com/btrfsoops.jpg) what kind of hardware
>> issues would cause a repeatable kernel crash like that?  Like am I
>> looking at memory issues or the SAS controller or what?
> It doesn't look like it died in can_overcommit, as that's not anywhere
> on the stack trace.  The second item on the stack though
> (btrfs_async_reclaim_metadata_space) at least partly reinforces the
> suspicion that something is messed up in the filesystems metadata (which
> could explain the allocations in GlobalReserve, which is a subset of the
> Metadata chunks).  It looks like each crash was in a different place,
> but at least the first two could easily be different parts of the kernel
> choking on the same thing.  As far as the crash in can_overcommit, that
> combined with the apparent corrupted metadata makes me think there may
> be a hardware problem.  The first thing I'd check in that respect is the
> cabling to the drives themselves, followed by system RAM, the PSU, and
> the the storage controller.  I generally check in that order because
> it's trivial to check the cabling, and not all that difficult to check
> the RAM and PSU (and RAM is more likely to go bad than the PSU), and
> properly checking a storage controller is extremely dificult unless you
> have another known working one you can swap it for (and even then, it's
> only practical to check if you know the state on disk won't cause the
> kernel to choke).

The first RIP: line (https://www.moparisthebest.com/btrfsoops.jpg) ends
in 'can_overcommit+0x1e/0xf0 [btrfs]', apologies for that being a
literal picture of a CRT instead of a searchable text file, doesn't
exactly make things easy... :(

Still I'm relieved that more points to bad metadata than to bad hardware.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: btrfs kernel oops on mount
  2016-09-12 13:27       ` Jeff Mahoney
@ 2016-09-12 13:58         ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2016-09-12 13:58 UTC (permalink / raw)
  To: Jeff Mahoney, moparisthebest, linux-btrfs

On 2016-09-12 09:27, Jeff Mahoney wrote:
> On 9/12/16 2:54 PM, Austin S. Hemmelgarn wrote:
>> On 2016-09-12 08:33, Jeff Mahoney wrote:
>>> On 9/9/16 8:47 PM, Austin S. Hemmelgarn wrote:
>>>> A couple of other things to comment about on this:
>>>> 1. 'can_overcommit' (the function that the Arch kernel choked on) is
>>>> from the memory management subsystem.  The fact that that's throwing a
>>>> null pointer says to me either your hardware has issues, or the Arch
>>>> kernel itself has problems (which would probably mean the kernel image
>>>> is corrupted).
>>>
>>> fs/btrfs/extent-tree.c:
>>> static int can_overcommit(struct btrfs_root *root,
>>>                           struct btrfs_space_info *space_info, u64 bytes,
>>>                           enum btrfs_reserve_flush_enum flush)
>>>
>> OK, my bad there, but that begs the question: why does a BTRFS function
>> not have a BTRFS prefix?  The name blatantly sounds like a mm function
>> (and I could have sworn I can across one with an almost identical name
>> when I was trying to understand the mm code a couple months ago), and
>> the lack of a prefix combined with that heavily implies that it's a core
>> kernel function.
>>
>> Given this, it's almost certainly the balance choking on corrupted
>> metadata that's causing the issue.
>
> Because it's a static function and has a namespace limited to the
> current C file.  If we prefixed every function in a local namespace with
> the subsystem, the code would be unreadable.  At any rate, the full
> symbol name in the Oops is:
>
> can_overcommit+0x1e/0x110 [btrfs]
>
> So we do identify the proper namespace in the Oops already.
Which somehow I missed...

Again, apologies for the confusion, I'm not used to reading an OOPS out 
of a picture of a CRT, and less so when trying to get someone help as 
quick as possible.
>
>>>> 3. In general, it's a good idea to keep an eye on space usage on your
>>>> filesystems.  If it's getting to be more than about 95% full, you should
>>>> be looking at getting some more storage space.  This is especially true
>>>> for BTRFS, as a 100% full BTRFS filesystem functionally becomes
>>>> permanently read-only because there's nowhere for the copy-on-write
>>>> updates to write to.
>>>
>>> The entire point of having the global metadata reserve is to avoid that
>>> situation.
>> Except that the global metadata reserve is usually only just barely big
>> enough, and it only works for metadata.  While I get that this issue is
>> what it's supposed to fix, it doesn't do so in a way that makes it easy
>> to get out of that situation.  The reserve itself is often not big
>> enough to do anything in any reasonable amount of time once the FS gets
>> beyond about a hundred GB and yous tart talking about very large files.
>
> Why would it need to apply to data?  The reserve is used to meet the
> reservation requirements to CoW metadata blocks needed to release the
> data blocks.  The data blocks themselves aren't touched; they're only
> released.  The size of the file really should only matter in terms of
> how many extent items need to be released but it shouldn't matter at all
> in terms of how many blocks the file's data occupies.  E.g. a 100 GB
> file that uses a handful of extents would be essentially free in this
> context.
I'm not saying it needs to apply to data, but it would be nice if things 
didn't blow up such that you immediately have to start deleting files or 
add more space when the data chunks become full.

As far as the sizing, I have had multiple times where the largest file 
in the filesystem couldn't be deleted because of the number of extents 
when the rest of the FS was full and GlobalReserve was being used for 
metadata operations (I don't know if it's significant, but I only saw 
this on filesystems with compress=lzo).

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-09-12 13:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-09 16:12 btrfs kernel oops on mount moparisthebest
2016-09-09 17:51 ` Chris Murphy
2016-09-09 18:32   ` moparisthebest
2016-09-09 18:49     ` Austin S. Hemmelgarn
2016-09-09 19:21     ` Chris Murphy
2016-09-10 15:13   ` moparisthebest
2016-09-09 18:47 ` Austin S. Hemmelgarn
2016-09-09 19:23   ` moparisthebest
2016-09-09 22:09     ` Duncan
2016-09-12 11:37     ` Austin S. Hemmelgarn
2016-09-12 13:32       ` moparisthebest
2016-09-09 19:28   ` Chris Murphy
2016-09-10 18:50     ` moparisthebest
2016-09-12 12:33   ` Jeff Mahoney
2016-09-12 12:54     ` Austin S. Hemmelgarn
2016-09-12 13:27       ` Jeff Mahoney
2016-09-12 13:58         ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.