linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel BUG at fs/btrfs/relocation.c:437!
@ 2020-09-01  8:18 Johannes Rohr
  2020-09-03  8:15 ` Johannes Rohr
  2020-09-03 15:56 ` Josef Bacik
  0 siblings, 2 replies; 3+ messages in thread
From: Johannes Rohr @ 2020-09-01  8:18 UTC (permalink / raw)
  To: linux-btrfs

Dear devs,

I tried to replace an SSD with bad S.M.A.R.T. status and since I don't
have physical access to the server, I first wanted to remove it from the
RAID 1 (which has 4 SSDs) and then erase it.

I ran "btrfs device delete /dev/sda2 /". After a while, the command
terminated with a segfault and the system hung. I waited for 30 minutes.
Fortunately, it could be resurrected with a hard reset.

dmesg, as this happened, reports that a block on a different SSD, on
/dev/sdc can't be found.

See full backtrace here:
https://gist.github.com/vasyugan/340d9cd2292e3122c1d7773df718a234

Now I am afraid that if sda is just removed physically, then marked as
degraded and swapped for a new SSD using the btrfs replace command, this
might also go bad  because of the block that can't be found.

Does any of you have advice on what to do? From the backtrace I don't
even understand if the issue is a physical problem with sdc (whose
S.M.A.R.T. values are just fine) or whether this is another btrfs bug
and if you, if there is any way to work around it.

We are running Ubuntu 20.04, the kernel is 5.4.0-45-generic, Ubuntu's
version number is: 5.4.0-45.49. It was released yesterday and was
supposed to have a relocation relate bug fixed, see
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1889669

I suppose, this is a separate issue. Should I report a bug? If so, where?

Thanks a lot in advance for your support!!!

Johannes




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel BUG at fs/btrfs/relocation.c:437!
  2020-09-01  8:18 kernel BUG at fs/btrfs/relocation.c:437! Johannes Rohr
@ 2020-09-03  8:15 ` Johannes Rohr
  2020-09-03 15:56 ` Josef Bacik
  1 sibling, 0 replies; 3+ messages in thread
From: Johannes Rohr @ 2020-09-03  8:15 UTC (permalink / raw)
  To: linux-btrfs

Sorry for nagging, but since I got no replies and neither found any
other reports that refer to line 437 of fs/btrfs/relocation.c I thought
I should ask again.

This is about a system running Ubuntu 20.04, uname -a says:

Linux ida.rooot.de 5.4.0-45-generic #49-Ubuntu SMP Wed Aug 26 13:38:52
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

That's a kernel package that was released 31 Aug.

I can reproduce the bug by running btrfs balance start /home -musage=70

After a few seconds, the command terminates with a segfault, dmesg
output is here:
https://gist.github.com/vasyugan/340d9cd2292e3122c1d7773df718a234

The core lines in dmesg seem to be:

[32387.616248] ------------[ cut here ]------------

[32387.616249] kernel BUG at fs/btrfs/relocation.c:437!

[32387.616271] invalid opcode: 0000 [#1] SMP PTI
[32387.616180] BTRFS error (device sdc2): couldn't find block
(4853431877632) (level 1) in tree (19918) with key (986 96 124)


There is nothing about an IO error in dmesg, so to me as a layman this
doesn't look like physical damage.

Yet when I repeat the command, it always names the same block.

I have of course run scrub, which as I understand it, does the same that
an  offline btrfs check  would do, it finds no errors.

It would be awesome if anyone could take this up.

I have submitted the bug to bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=209143

Cheers,

Johannes


Am 01.09.20 um 10:18 schrieb Johannes Rohr:
> Dear devs,
>
> I tried to replace an SSD with bad S.M.A.R.T. status and since I don't
> have physical access to the server, I first wanted to remove it from the
> RAID 1 (which has 4 SSDs) and then erase it.
>
> I ran "btrfs device delete /dev/sda2 /". After a while, the command
> terminated with a segfault and the system hung. I waited for 30 minutes.
> Fortunately, it could be resurrected with a hard reset.
>
> dmesg, as this happened, reports that a block on a different SSD, on
> /dev/sdc can't be found.
>
> See full backtrace here:
> https://gist.github.com/vasyugan/340d9cd2292e3122c1d7773df718a234
>
> Now I am afraid that if sda is just removed physically, then marked as
> degraded and swapped for a new SSD using the btrfs replace command, this
> might also go bad  because of the block that can't be found.
>
> Does any of you have advice on what to do? From the backtrace I don't
> even understand if the issue is a physical problem with sdc (whose
> S.M.A.R.T. values are just fine) or whether this is another btrfs bug
> and if you, if there is any way to work around it.
>
> We are running Ubuntu 20.04, the kernel is 5.4.0-45-generic, Ubuntu's
> version number is: 5.4.0-45.49. It was released yesterday and was
> supposed to have a relocation relate bug fixed, see
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1889669
>
> I suppose, this is a separate issue. Should I report a bug? If so, where?
>
> Thanks a lot in advance for your support!!!
>
> Johannes
>
>
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel BUG at fs/btrfs/relocation.c:437!
  2020-09-01  8:18 kernel BUG at fs/btrfs/relocation.c:437! Johannes Rohr
  2020-09-03  8:15 ` Johannes Rohr
@ 2020-09-03 15:56 ` Josef Bacik
  1 sibling, 0 replies; 3+ messages in thread
From: Josef Bacik @ 2020-09-03 15:56 UTC (permalink / raw)
  To: Johannes Rohr, linux-btrfs

On 9/1/20 4:18 AM, Johannes Rohr wrote:
> Dear devs,
> 
> I tried to replace an SSD with bad S.M.A.R.T. status and since I don't
> have physical access to the server, I first wanted to remove it from the
> RAID 1 (which has 4 SSDs) and then erase it.
> 
> I ran "btrfs device delete /dev/sda2 /". After a while, the command
> terminated with a segfault and the system hung. I waited for 30 minutes.
> Fortunately, it could be resurrected with a hard reset.
> 
> dmesg, as this happened, reports that a block on a different SSD, on
> /dev/sdc can't be found.
> 
> See full backtrace here:
> https://gist.github.com/vasyugan/340d9cd2292e3122c1d7773df718a234
> 
> Now I am afraid that if sda is just removed physically, then marked as
> degraded and swapped for a new SSD using the btrfs replace command, this
> might also go bad  because of the block that can't be found.
> 
> Does any of you have advice on what to do? From the backtrace I don't
> even understand if the issue is a physical problem with sdc (whose
> S.M.A.R.T. values are just fine) or whether this is another btrfs bug
> and if you, if there is any way to work around it.
> 
> We are running Ubuntu 20.04, the kernel is 5.4.0-45-generic, Ubuntu's
> version number is: 5.4.0-45.49. It was released yesterday and was
> supposed to have a relocation relate bug fixed, see
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1889669
> 
> I suppose, this is a separate issue. Should I report a bug? If so, where?
> 
> Thanks a lot in advance for your support!!!
> 

This error message sounds like a corrupted file system.  However I fixed quite a 
few things in relocation recently, try a more recent kernel, 5.8 has all my 
recent fixes in this area.  If not then I'd try btrfs check /dev/whatever to see 
if it complains about your fs being corrupted.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-09-03 15:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01  8:18 kernel BUG at fs/btrfs/relocation.c:437! Johannes Rohr
2020-09-03  8:15 ` Johannes Rohr
2020-09-03 15:56 ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).