All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: "Swâmi Petaramesh" <swami@petaramesh.org>,
	"Anand Jain" <anand.jain@oracle.com>
Cc: Lionel Bouton <lionel-subscription@bouton.name>,
	linux-btrfs@vger.kernel.org
Subject: Re: Massive filesystem corruption since kernel 5.2 (ARCH)
Date: Thu, 8 Aug 2019 18:12:49 +0800	[thread overview]
Message-ID: <ae330c40-4cd5-2e1f-feae-210d7af85a70@gmx.com> (raw)
In-Reply-To: <22973d72-5709-c705-1c8d-1b438df1cc49@petaramesh.org>



On 2019/8/8 下午5:55, Swâmi Petaramesh wrote:
> Hi Qu,
>
> On 8/8/19 10:46 AM, Qu Wenruo wrote:
>> Follow up questions about the corruption.
>>
>> Is there enough free space (not only unallocated, but allocated bg) for
>> metadata?
>>
>> As further digging into the case, it looks like btrfs is even harder to
>> get corrupted for tree blocks.
>>
>> If we have enough metadata free space, we will try to allocate tree
>> blocks at bytenr sequence, without reusing old bytenr until there is not
>> enough space or hit the end of the block group.
>>
>> This means, even we have something wrong implementing barrier, we still
>> won't write new data to old tree blocks (even several trans ago).
>
>
> It's kind of hard for me to say if the 2 filesystems that got corrupt
> lacked allocated metadata space at any time, and now both filesystems
> have been reformatted, so I cannot tell.
>
> What I can be 100% sure is that I never got any “No space left on
> device” ENOSPC on any of them.

No need to hit ENOSPC, although it needs extra info to get the metadata
bg usage to determine, so I didn't expect it to be easy to get.

>
> *BUT* the SSD on which the machine runs may have run close to full as I
> had copied a bunch of ISOs on it shortly before upgrading packages - and
> kernel.
>
> However the upgrade went seemingly good and I didn't see no ENOSPC at
> any time.
>
>
> On the external HD that went corrupt as well, I'm pretty sure it
> happened as follows :
>
> - I started a full backup onto it in an emergency ;
>
> - I asked myself « Will I have enough space » and checked with “df”.
>
> - There were still several dozens of GBs free but not enough for a full
> system backup. I cannot tell if these had been allocated or not in the past.
>
> - Noticing that I would miss HD space (but far before it actually
> happened) I deleted a high number of snapshots from the HD.
>
> - I thus assume that the deletion of snapshots would have freed a good
> amount of data AND metadata space.
>
> So the situation of the external HD was that a full backup was in
> progress and a vast number of snapshots have been deleted meanwhile.
>
> After that the FS got corrupt at some point.
>
>
> For the internal SSD, it looks like the kernel upgrade went good and the
> machine rebooted OK, then midnight came and with it probably the cron
> task that performs “snapper” timeline snapshots deletion.
>
>
> Then the machine was turned off and rebooted next day, and by that time
> the FS was corrupt.
>
>
> So I strongly suspect the issue has something to do with snapshots
> deletion, but I cannot tell more.

I was also working on that in recent days, hasn't yet got any clue. (In
fact, just find btrfs harder to get corrupted if there is enough
metadata space).
But will definitely continue digging.

>
>
> It may be worth noticing that the machine has been running a lot since I
> reverted back to kernel 5.1 and reformatted the filesystems, and that no
> corruption has occurred since, even though I performed quite a lot of
> backups on the external HD after it has been reformatted.
>
> Everything is in the exact same setup as before, except for the kernel.
>
> So I would definitely exclude an hardware problem on the machine : it's
> now running fine as it ever did.
>
> I plan to retry upgrading to Arch kernel 5.2 in the coming weeks after
> having performed a full disk binary clone in case it happens again.
>
> (However I've seen that Arch has released 3-4 kernel 5.2 package updates
> since, so it won't be the exact same kernel by the time I test again).

No problem, not that many fixes get backport, none of them are really
high priority so I'd say it would not make much difference.

>
> I will be on vacation until August, 20, so I cannot perform this test
> before I'm back.
>
> But I'll be glad to help if I can and thank you very much for your help
> with this issue.

My pleasure, if we could finally pin down the cause, it would be a great
improvement for btrfs.

Thanks,
Qu
>
> Best regards.
>
> ॐ
>

  reply	other threads:[~2019-08-08 10:13 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-29 12:32 Massive filesystem corruption since kernel 5.2 (ARCH) Swâmi Petaramesh
2019-07-29 13:02 ` Swâmi Petaramesh
2019-07-29 13:35   ` Qu Wenruo
2019-07-29 13:42     ` Swâmi Petaramesh
2019-07-29 13:47       ` Qu Wenruo
2019-07-29 13:52         ` Swâmi Petaramesh
2019-07-29 13:59           ` Qu Wenruo
2019-07-29 14:01           ` Swâmi Petaramesh
2019-07-29 14:08             ` Qu Wenruo
2019-07-29 14:21               ` Swâmi Petaramesh
2019-07-29 14:27                 ` Qu Wenruo
2019-07-29 14:34                   ` Swâmi Petaramesh
2019-07-29 14:40                     ` Qu Wenruo
2019-07-29 14:46                       ` Swâmi Petaramesh
2019-07-29 14:51                         ` Qu Wenruo
2019-07-29 14:55                           ` Swâmi Petaramesh
2019-07-29 15:05                             ` Swâmi Petaramesh
2019-07-29 19:20                               ` Chris Murphy
2019-07-30  6:47                                 ` Swâmi Petaramesh
2019-07-29 19:10                       ` Chris Murphy
2019-07-30  8:09                         ` Swâmi Petaramesh
2019-07-30 20:15                           ` Chris Murphy
2019-07-30 22:44                             ` Swâmi Petaramesh
2019-07-30 23:13                               ` Graham Cobb
2019-07-30 23:24                                 ` Chris Murphy
     [not found] ` <f8b08aec-2c43-9545-906e-7e41953d9ed4@bouton.name>
2019-07-29 13:35   ` Swâmi Petaramesh
2019-07-30  8:04     ` Henk Slager
2019-07-30  8:17       ` Swâmi Petaramesh
2019-07-29 13:39   ` Lionel Bouton
2019-07-29 13:45     ` Swâmi Petaramesh
     [not found]       ` <d8c571e4-718e-1241-66ab-176d091d6b48@bouton.name>
2019-07-29 14:04         ` Swâmi Petaramesh
2019-08-01  4:50           ` Anand Jain
2019-08-01  6:07             ` Swâmi Petaramesh
2019-08-01  6:36               ` Qu Wenruo
2019-08-01  8:07                 ` Swâmi Petaramesh
2019-08-01  8:43                   ` Qu Wenruo
2019-08-01 13:46                     ` Anand Jain
2019-08-01 18:56                       ` Swâmi Petaramesh
2019-08-08  8:46                         ` Qu Wenruo
2019-08-08  9:55                           ` Swâmi Petaramesh
2019-08-08 10:12                             ` Qu Wenruo [this message]
2019-08-24 17:44 Christoph Anton Mitterer
2019-08-25 10:00 ` Swâmi Petaramesh
2019-08-27  0:00   ` Christoph Anton Mitterer
2019-08-27  5:06     ` Swâmi Petaramesh
2019-08-27  6:13       ` Swâmi Petaramesh
2019-08-27  6:21         ` Qu Wenruo
2019-08-27  6:34           ` Swâmi Petaramesh
2019-08-27  6:52             ` Qu Wenruo
2019-08-27  9:14               ` Swâmi Petaramesh
2019-08-27 12:40                 ` Hans van Kranenburg
2019-08-29 12:46                   ` Oliver Freyermuth
2019-08-29 13:08                     ` Christoph Anton Mitterer
2019-08-29 13:09                     ` Swâmi Petaramesh
2019-08-29 13:11                     ` Qu Wenruo
2019-08-29 13:17                       ` Oliver Freyermuth
2019-08-29 17:40                         ` Oliver Freyermuth
2019-08-27 10:59           ` Swâmi Petaramesh
2019-08-27 11:11             ` Alberto Bursi
2019-08-27 11:20               ` Swâmi Petaramesh
2019-08-27 11:29                 ` Alberto Bursi
2019-08-27 11:45                   ` Swâmi Petaramesh
2019-08-27 17:49               ` Swâmi Petaramesh
2019-08-27 22:10               ` Chris Murphy
2019-08-27 12:52 ` Michal Soltys
2019-09-12  7:50 ` Filipe Manana
2019-09-12  8:24   ` James Harvey
2019-09-12  9:06     ` Filipe Manana
2019-09-12  9:09     ` Holger Hoffstätte
2019-09-12 10:53     ` Swâmi Petaramesh
2019-09-12 12:58       ` Christoph Anton Mitterer
2019-10-14  4:00         ` Nicholas D Steeves
2019-09-12  8:48   ` Swâmi Petaramesh
2019-09-12 13:09   ` Christoph Anton Mitterer
2019-09-12 14:28     ` Filipe Manana
2019-09-12 14:39       ` Christoph Anton Mitterer
2019-09-12 14:57         ` Swâmi Petaramesh
2019-09-12 16:21           ` Zdenek Kaspar
2019-09-12 18:52             ` Swâmi Petaramesh
2019-09-13 18:50       ` Pete
     [not found]         ` <CACzgC9gvhGwyQAKm5J1smZZjim-ecEix62ZQCY-wwJYVzMmJ3Q@mail.gmail.com>
2019-10-14  2:07           ` Adam Bahe
2019-10-14  2:19             ` Qu Wenruo
2019-10-14 17:54             ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae330c40-4cd5-2e1f-feae-210d7af85a70@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lionel-subscription@bouton.name \
    --cc=swami@petaramesh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.