All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Alex Adriaanse <alex@oseberg.io>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Ongoing Btrfs stability issues
Date: Thu, 15 Feb 2018 22:42:00 +0200	[thread overview]
Message-ID: <17374c30-4376-a96f-ee38-791a95676ae0@suse.com> (raw)
In-Reply-To: <2A1FE868-FE3A-489A-B600-8F460D3DFFC8@oseberg.io>



On 15.02.2018 21:41, Alex Adriaanse wrote:
> 
>> On Feb 15, 2018, at 12:00 PM, Nikolay Borisov <nborisov@suse.com> wrote:
>>
>> So in all of the cases you are hitting some form of premature enospc.
>> There was a fix that landed in 4.15 that should have fixed a rather
>> long-standing issue with the way metadata reservations are satisfied,
>> namely:
>>
>> 996478ca9c46 ("btrfs: change how we decide to commit transactions during
>> flushing").
>>
>> That commit was introduced in 4.14.3 stable kernel. Since you are not
>> using upstream kernel I'd advise you check whether the respective commit
>> is contained in the kernel versions you are using.
>>
>> Other than that in the reports you mentioned there is one crash in
>> __del_reloc_root which looks rather interesting, at the very least it
>> shouldn't crash...
> 
> I checked the Debian source code that's used for building the kernels that we run, and can confirm that both 4.14.7-1~bpo9+1 and 4.14.13-1~bpo9+1 contain the changes associated with the commit you referenced. So crash instances #2, #3, and #4 at https://bugzilla.kernel.org/show_bug.cgi?id=198787 were all running kernels that contain this fix already.
> 
> Could it be that some on-disk data structures got (silently) corrupted while we were running pre-4.14.7 kernels, and the aforementioned fix doesn't address anything relating to damage that has already been done? If so, is there a way to detect and/or repair this for existing filesystems other than running a "btrfs check --repair" or rebuilding filesystems (both of which require a significant amount of downtime)?

>From the logs provided I can see only a single crash, the others are
just ENOSPC which can cause corruption due to delayed refs (in majority
of examples) not finishing. Is btrfs hosted on the EBS volume or on the
ephemeral storage of the instance? Is the EBS an ssd? If it's ssd are
you using an io scheduler for those ebs devices? You ca check what the
io scheduler for a device is by reading the following sysfs file:

/sys/block/<disk device>/queue/scheduler


> 
> Thanks,
> 
> Alex
> 

  reply	other threads:[~2018-02-15 20:42 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-15 16:18 Ongoing Btrfs stability issues Alex Adriaanse
2018-02-15 18:00 ` Nikolay Borisov
2018-02-15 19:41   ` Alex Adriaanse
2018-02-15 20:42     ` Nikolay Borisov [this message]
2018-02-16  4:54       ` Alex Adriaanse
2018-02-16  7:40         ` Nikolay Borisov
2018-02-16 19:44 ` Austin S. Hemmelgarn
2018-02-17  3:03   ` Duncan
2018-02-17  4:34     ` Shehbaz Jaffer
2018-02-17 15:18       ` Hans van Kranenburg
2018-02-17 16:42         ` Shehbaz Jaffer
2018-03-01 19:04   ` Alex Adriaanse
2018-03-01 19:40     ` Nikolay Borisov
2018-03-02 17:29       ` Liu Bo
2018-03-08 17:40         ` Alex Adriaanse
2018-03-09  9:54           ` Nikolay Borisov
2018-03-09 19:05             ` Alex Adriaanse
2018-03-10 12:04               ` Nikolay Borisov
2018-03-10 14:29                 ` Christoph Anton Mitterer
2018-03-11 17:51                   ` Goffredo Baroncelli
2018-03-11 22:37                     ` Christoph Anton Mitterer
2018-03-12 21:22                       ` Goffredo Baroncelli
2018-03-12 21:48                         ` Christoph Anton Mitterer
2018-03-13 19:36                           ` Goffredo Baroncelli
2018-03-13 20:10                             ` Christoph Anton Mitterer
2018-03-14 12:02                             ` Austin S. Hemmelgarn
2018-03-14 18:39                               ` Goffredo Baroncelli
2018-03-14 19:27                                 ` Austin S. Hemmelgarn
2018-03-14 22:17                                   ` Goffredo Baroncelli
2018-03-13 13:47               ` Patrik Lundquist
2018-03-02  4:02     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17374c30-4376-a96f-ee38-791a95676ae0@suse.com \
    --to=nborisov@suse.com \
    --cc=alex@oseberg.io \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.