All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing
@ 2015-10-22  5:32 Erkki Seppala
  2015-10-22  8:53 ` Filipe Manana
  0 siblings, 1 reply; 5+ messages in thread
From: Erkki Seppala @ 2015-10-22  5:32 UTC (permalink / raw)
  To: linux-btrfs

Hello,

Recently I added daily rebalancing to my cron.d (after finding myself in
the no-space-situation), and not long after that, I found my PC had
crashed over night. Having no sign in the logs anywhere (not even over
network even though there should be) I had nothing to go on, but this
night it crashed again after starting the rebalance, and this time there
was some information on the kernel log.

Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 4.2.3-1
from Debian Unstable)

The dump is available at:

  http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt

The log is available as well (stripped some unrelated USB- and firewall
logging, showing that last evening there was some kernel task hung for
120 seconds; but it's in another btrfs filesystem and is another story):

  http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt

I'm not quite sure which of the btrfs balance commands caused the
issue. But there is my script:

#!/bin/sh
fs="$1"
if [ -z "$fs" ]; then
  echo usage: btrfs-balance / 0 1 5 10 20 50
  exit 1
fi
fs="$1"
shift
for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start
"$fs" -v -${usage}usage=$a; done; done

And it was started at 07:30 with:

  /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70

I should add that the filesystem in question is backed by MD RAID10 and
that is backed by four SSDs, so it's reasonably fast in IO, if that
affects anything. There should have been no much competing IO at the
time of the occurrence.

Before Duncan asks ;-), I only have a moderate number of subvolumes and
snapshots, ie. one subvolume for each of /, /var/log/journal and /home,
24 snapshots of / and /home plus <10 snapshots of /.

Before that balance there was another balance on a another BTRFS RAID10,
but given the time stamp I think I can easily say it wasn't the cause.

I don't really have other 'solutions' than disabling the rebalancing for
the time being, and only use it as-needed as I had earlier done..

Cheers,

-- 
  _____________________________________________________________________
     / __// /__ ____  __               http://www.modeemi.fi/~flux/\   \
    / /_ / // // /\ \/ /                                            \  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi                                  \/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing
  2015-10-22  5:32 BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing Erkki Seppala
@ 2015-10-22  8:53 ` Filipe Manana
  2015-10-22 10:06   ` Stéphane Lesimple
                     ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Filipe Manana @ 2015-10-22  8:53 UTC (permalink / raw)
  To: Erkki Seppala; +Cc: linux-btrfs

On Thu, Oct 22, 2015 at 6:32 AM, Erkki Seppala <flux-btrfs@inside.org> wrote:
> Hello,
>
> Recently I added daily rebalancing to my cron.d (after finding myself in
> the no-space-situation), and not long after that, I found my PC had
> crashed over night. Having no sign in the logs anywhere (not even over
> network even though there should be) I had nothing to go on, but this
> night it crashed again after starting the rebalance, and this time there
> was some information on the kernel log.
>
> Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 4.2.3-1
> from Debian Unstable)
>
> The dump is available at:
>
>   http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt
>
> The log is available as well (stripped some unrelated USB- and firewall
> logging, showing that last evening there was some kernel task hung for
> 120 seconds; but it's in another btrfs filesystem and is another story):
>
>   http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt
>
> I'm not quite sure which of the btrfs balance commands caused the
> issue. But there is my script:
>
> #!/bin/sh
> fs="$1"
> if [ -z "$fs" ]; then
>   echo usage: btrfs-balance / 0 1 5 10 20 50
>   exit 1
> fi
> fs="$1"
> shift
> for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start
> "$fs" -v -${usage}usage=$a; done; done
>
> And it was started at 07:30 with:
>
>   /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70
>
> I should add that the filesystem in question is backed by MD RAID10 and
> that is backed by four SSDs, so it's reasonably fast in IO, if that
> affects anything. There should have been no much competing IO at the
> time of the occurrence.
>
> Before Duncan asks ;-), I only have a moderate number of subvolumes and
> snapshots, ie. one subvolume for each of /, /var/log/journal and /home,
> 24 snapshots of / and /home plus <10 snapshots of /.
>
> Before that balance there was another balance on a another BTRFS RAID10,
> but given the time stamp I think I can easily say it wasn't the cause.
>
> I don't really have other 'solutions' than disabling the rebalancing for
> the time being, and only use it as-needed as I had earlier done..

Try this (just sent a few minutes ago):
https://patchwork.kernel.org/patch/7463161/

thanks

>
> Cheers,
>
> --
>   _____________________________________________________________________
>      / __// /__ ____  __               http://www.modeemi.fi/~flux/\   \
>     / /_ / // // /\ \/ /                                            \  /
>    /_/  /_/ \___/ /_/\_\@modeemi.fi                                  \/
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing
  2015-10-22  8:53 ` Filipe Manana
@ 2015-10-22 10:06   ` Stéphane Lesimple
  2015-10-22 16:55   ` Erkki Seppala
  2015-10-30  8:57   ` Erkki Seppala
  2 siblings, 0 replies; 5+ messages in thread
From: Stéphane Lesimple @ 2015-10-22 10:06 UTC (permalink / raw)
  To: fdmanana; +Cc: Erkki Seppala, linux-btrfs

Le 2015-10-22 10:53, Filipe Manana a écrit :
> On Thu, Oct 22, 2015 at 6:32 AM, Erkki Seppala <flux-btrfs@inside.org> 
> wrote:
>> Hello,
>> 
>> Recently I added daily rebalancing to my cron.d (after finding myself 
>> in
>> the no-space-situation), and not long after that, I found my PC had
>> crashed over night. Having no sign in the logs anywhere (not even over
>> network even though there should be) I had nothing to go on, but this
>> night it crashed again after starting the rebalance, and this time 
>> there
>> was some information on the kernel log.
>> 
>> Kernel version: 4.2.3 (package linux-image-4.2.0-1-amd64 version 
>> 4.2.3-1
>> from Debian Unstable)
>> 
>> The dump is available at:
>> 
>>   http://www.modeemi.fi/~flux/btrfs/btrfs-BUG-2015-10-55.txt
>> 
>> The log is available as well (stripped some unrelated USB- and 
>> firewall
>> logging, showing that last evening there was some kernel task hung for
>> 120 seconds; but it's in another btrfs filesystem and is another 
>> story):
>> 
>>   http://www.modeemi.fi/~flux/btrfs/btrfs-2015-10-55.txt
>> 
>> I'm not quite sure which of the btrfs balance commands caused the
>> issue. But there is my script:
>> 
>> #!/bin/sh
>> fs="$1"
>> if [ -z "$fs" ]; then
>>   echo usage: btrfs-balance / 0 1 5 10 20 50
>>   exit 1
>> fi
>> fs="$1"
>> shift
>> for usage in d m; do for a in "$@"; do date; /bin/btrfs balance start
>> "$fs" -v -${usage}usage=$a; done; done
>> 
>> And it was started at 07:30 with:
>> 
>>   /usr/local/sbin/btrfs-balance / 0 1 2 5 10 20 30 50 70
>> 
>> I should add that the filesystem in question is backed by MD RAID10 
>> and
>> that is backed by four SSDs, so it's reasonably fast in IO, if that
>> affects anything. There should have been no much competing IO at the
>> time of the occurrence.
>> 
>> Before Duncan asks ;-), I only have a moderate number of subvolumes 
>> and
>> snapshots, ie. one subvolume for each of /, /var/log/journal and 
>> /home,
>> 24 snapshots of / and /home plus <10 snapshots of /.
>> 
>> Before that balance there was another balance on a another BTRFS 
>> RAID10,
>> but given the time stamp I think I can easily say it wasn't the cause.
>> 
>> I don't really have other 'solutions' than disabling the rebalancing 
>> for
>> the time being, and only use it as-needed as I had earlier done..
> 
> Try this (just sent a few minutes ago):
> https://patchwork.kernel.org/patch/7463161/


Awesome, I'll also try it right now under 4.3.0-rc6. My system is 
currently hit so hard by this bug that it no longer survives a balance 
for longer than a few minutes.

Will keep you posted on the outcome.

Thanks,

-- 
Stéphane.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing
  2015-10-22  8:53 ` Filipe Manana
  2015-10-22 10:06   ` Stéphane Lesimple
@ 2015-10-22 16:55   ` Erkki Seppala
  2015-10-30  8:57   ` Erkki Seppala
  2 siblings, 0 replies; 5+ messages in thread
From: Erkki Seppala @ 2015-10-22 16:55 UTC (permalink / raw)
  To: linux-btrfs

Hello,

Thanks for the super-fast response :).

I've installed the patch and shall be waiting. The effects should be
visible within a week given daily rebalances of two filesystems.

-- 
  _____________________________________________________________________
     / __// /__ ____  __               http://www.modeemi.fi/~flux/\   \
    / /_ / // // /\ \/ /                                            \  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi                                  \/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing
  2015-10-22  8:53 ` Filipe Manana
  2015-10-22 10:06   ` Stéphane Lesimple
  2015-10-22 16:55   ` Erkki Seppala
@ 2015-10-30  8:57   ` Erkki Seppala
  2 siblings, 0 replies; 5+ messages in thread
From: Erkki Seppala @ 2015-10-30  8:57 UTC (permalink / raw)
  To: linux-btrfs

Filipe Manana <fdmanana@gmail.com> writes:
> Try this (just sent a few minutes ago):
> https://patchwork.kernel.org/patch/7463161/

I've been using this patch for a week now, doing two rebalances a day
(one per file system) - no problem so far. Thanks!

Probably unrelated to this I did experience one reboot without any
trace, possibly because I had enabled panic = 10 and panic_on_oops = 1,
but that event did not happen anytime near a balance was happening. I
wonder if the hang detector could trigger that configuration to reboot?

Thanks again for the great work, your detective work is always
impressive :).

-- 
  _____________________________________________________________________
     / __// /__ ____  __               http://www.modeemi.fi/~flux/\   \
    / /_ / // // /\ \/ /                                            \  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi                                  \/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-10-30  8:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-22  5:32 BTRFS BUG at insert_inline_extent_backref+0xe3/0xf0 while rebalancing Erkki Seppala
2015-10-22  8:53 ` Filipe Manana
2015-10-22 10:06   ` Stéphane Lesimple
2015-10-22 16:55   ` Erkki Seppala
2015-10-30  8:57   ` Erkki Seppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.