All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS: corruption detected in 5.9.10, upgrading from 5.9.6: (partial) panic log
@ 2020-11-22 18:38 Nick Alcock
  2020-11-22 19:37 ` Darrick J. Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Nick Alcock @ 2020-11-22 18:38 UTC (permalink / raw)
  To: linux-xfs

So I just tried to reboot my x86 server box from 5.9.6 to 5.9.10 and my
system oopsed with an xfs fs corruption message when I kicked up
Chromium on another machine which mounted $HOME from the server box (it
panicked without logging anything, because the corruption was detected
on the rootfs, and it is also the loghost). A subsequent reboot died
instantly as soon as it tried to mount root, but the next one got all
the way to starting Chromium before dying again the same way.

Rebooting back into 5.9.6 causes everything to work fine again, no
reports of corruption and starting Chromium works.

This fs has rmapbt and reflinks enabled, on a filesystem originally
created by xfsprogs 4.10.0, but I have never knowingly used them under
the Chromium config dirs (or, actually, under that user's $HOME at all).
I've used them extensively elsewhere on the fs though. The FS is sitting
above a libata -> md-raid6 -> bcache stack. (It is barely possible that
bcache is at fault, but bcache has seen no changes since 5.9.6 so I
doubt it.)

The relevant bits of the log I could capture -- no console scrollback
these days, of course :( and it was a panic anyway so the top is just
lost -- is in a photo here:

  <http://www.esperi.org.uk/~nix/temporary/xfs-crash.jpg>

The mkfs line used to create this fs was:

mkfs.xfs -m rmapbt=1,reflink=1 -d agcount=17,sunit=$((128*8)),swidth=$((384*8)) -l logdev=/dev/sde3,size=521728b -i sparse=1,maxpct=25 /dev/main/root

(/dev/sde3 is an SSD which also hosts the bcache and RAID journal,
though this RAID device is not journalled, and is operating fine.)

I am not using a realtime device.

I have *not* yet run xfs_repair, but just rebooted back into the old
kernel, since everything worked there: I'll run xfs_repair over the fs
if you think it wise to do so, but right now I have a state which
crashes on one kernel and works on another one, which seems useful to
not try to fix in case you have some use for it.

Since everything is working fine in 5.9.6 and there were XFS changes
after that, I'm hypothesising that this is probably a bug in the
post-5.9.6 changes rather than anything xfs_repair should be trying to
fix. But I really don't know :)

(I can't help but notice that all these post-5.9.6 XFS changes were
sucked in by Sasha's magic regression-hunting stable-tree AI, which I
thought wasn't meant to happen -- but I've not been watching closely,
and if you changed your minds after the LWN article went in I won't have
seen it.)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: XFS: corruption detected in 5.9.10, upgrading from 5.9.6: (partial) panic log
  2020-11-22 18:38 XFS: corruption detected in 5.9.10, upgrading from 5.9.6: (partial) panic log Nick Alcock
@ 2020-11-22 19:37 ` Darrick J. Wong
  2020-11-22 20:14   ` Nick Alcock
  0 siblings, 1 reply; 3+ messages in thread
From: Darrick J. Wong @ 2020-11-22 19:37 UTC (permalink / raw)
  To: Nick Alcock; +Cc: linux-xfs

On Sun, Nov 22, 2020 at 06:38:28PM +0000, Nick Alcock wrote:
> So I just tried to reboot my x86 server box from 5.9.6 to 5.9.10 and my

Sorry about that, there was a bad patch in -rc4 that got sucked into
5.9.9 because it had a fixes tag.  The revert is already upstream:

https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?id=eb8409071a1d47e3593cfe077107ac46853182ab

--D

> system oopsed with an xfs fs corruption message when I kicked up
> Chromium on another machine which mounted $HOME from the server box (it
> panicked without logging anything, because the corruption was detected
> on the rootfs, and it is also the loghost). A subsequent reboot died
> instantly as soon as it tried to mount root, but the next one got all
> the way to starting Chromium before dying again the same way.
> 
> Rebooting back into 5.9.6 causes everything to work fine again, no
> reports of corruption and starting Chromium works.
> 
> This fs has rmapbt and reflinks enabled, on a filesystem originally
> created by xfsprogs 4.10.0, but I have never knowingly used them under
> the Chromium config dirs (or, actually, under that user's $HOME at all).
> I've used them extensively elsewhere on the fs though. The FS is sitting
> above a libata -> md-raid6 -> bcache stack. (It is barely possible that
> bcache is at fault, but bcache has seen no changes since 5.9.6 so I
> doubt it.)
> 
> The relevant bits of the log I could capture -- no console scrollback
> these days, of course :( and it was a panic anyway so the top is just
> lost -- is in a photo here:
> 
>   <http://www.esperi.org.uk/~nix/temporary/xfs-crash.jpg>
> 
> The mkfs line used to create this fs was:
> 
> mkfs.xfs -m rmapbt=1,reflink=1 -d agcount=17,sunit=$((128*8)),swidth=$((384*8)) -l logdev=/dev/sde3,size=521728b -i sparse=1,maxpct=25 /dev/main/root
> 
> (/dev/sde3 is an SSD which also hosts the bcache and RAID journal,
> though this RAID device is not journalled, and is operating fine.)
> 
> I am not using a realtime device.
> 
> I have *not* yet run xfs_repair, but just rebooted back into the old
> kernel, since everything worked there: I'll run xfs_repair over the fs
> if you think it wise to do so, but right now I have a state which
> crashes on one kernel and works on another one, which seems useful to
> not try to fix in case you have some use for it.
> 
> Since everything is working fine in 5.9.6 and there were XFS changes
> after that, I'm hypothesising that this is probably a bug in the
> post-5.9.6 changes rather than anything xfs_repair should be trying to
> fix. But I really don't know :)
> 
> (I can't help but notice that all these post-5.9.6 XFS changes were
> sucked in by Sasha's magic regression-hunting stable-tree AI, which I
> thought wasn't meant to happen -- but I've not been watching closely,
> and if you changed your minds after the LWN article went in I won't have
> seen it.)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: XFS: corruption detected in 5.9.10, upgrading from 5.9.6: (partial) panic log
  2020-11-22 19:37 ` Darrick J. Wong
@ 2020-11-22 20:14   ` Nick Alcock
  0 siblings, 0 replies; 3+ messages in thread
From: Nick Alcock @ 2020-11-22 20:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 22 Nov 2020, Darrick J. Wong stated:

> On Sun, Nov 22, 2020 at 06:38:28PM +0000, Nick Alcock wrote:
>> So I just tried to reboot my x86 server box from 5.9.6 to 5.9.10 and my
>
> Sorry about that, there was a bad patch in -rc4 that got sucked into
> 5.9.9 because it had a fixes tag.  The revert is already upstream:
>
> https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?id=eb8409071a1d47e3593cfe077107ac46853182ab

Thanks! Will give it a try soon :)

(Solved in a couple of hours on a Sunday and all I had to do was mail
off a photo. As I always say about free software, especially while
feeling guilty about my own response times... you can't pay for service
like this!)

... and no I don't know why I didn't think to check the master branch
for obvious related reversions in fs/xfs. I'll do that next time before
bothering other people.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-11-22 20:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-22 18:38 XFS: corruption detected in 5.9.10, upgrading from 5.9.6: (partial) panic log Nick Alcock
2020-11-22 19:37 ` Darrick J. Wong
2020-11-22 20:14   ` Nick Alcock

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.