XFS: corruption detected in 5.9.10, upgrading from 5.9.6: (partial) panic log

* XFS: corruption detected in 5.9.10, upgrading from 5.9.6: (partial) panic log
@ 2020-11-22 18:38 Nick Alcock
  2020-11-22 19:37 ` Darrick J. Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Nick Alcock @ 2020-11-22 18:38 UTC (permalink / raw)
  To: linux-xfs

So I just tried to reboot my x86 server box from 5.9.6 to 5.9.10 and my
system oopsed with an xfs fs corruption message when I kicked up
Chromium on another machine which mounted $HOME from the server box (it
panicked without logging anything, because the corruption was detected
on the rootfs, and it is also the loghost). A subsequent reboot died
instantly as soon as it tried to mount root, but the next one got all
the way to starting Chromium before dying again the same way.

Rebooting back into 5.9.6 causes everything to work fine again, no
reports of corruption and starting Chromium works.

This fs has rmapbt and reflinks enabled, on a filesystem originally
created by xfsprogs 4.10.0, but I have never knowingly used them under
the Chromium config dirs (or, actually, under that user's $HOME at all).
I've used them extensively elsewhere on the fs though. The FS is sitting
above a libata -> md-raid6 -> bcache stack. (It is barely possible that
bcache is at fault, but bcache has seen no changes since 5.9.6 so I
doubt it.)

The relevant bits of the log I could capture -- no console scrollback
these days, of course :( and it was a panic anyway so the top is just
lost -- is in a photo here:

  <http://www.esperi.org.uk/~nix/temporary/xfs-crash.jpg>

The mkfs line used to create this fs was:

mkfs.xfs -m rmapbt=1,reflink=1 -d agcount=17,sunit=$((128*8)),swidth=$((384*8)) -l logdev=/dev/sde3,size=521728b -i sparse=1,maxpct=25 /dev/main/root

(/dev/sde3 is an SSD which also hosts the bcache and RAID journal,
though this RAID device is not journalled, and is operating fine.)

I am not using a realtime device.

I have *not* yet run xfs_repair, but just rebooted back into the old
kernel, since everything worked there: I'll run xfs_repair over the fs
if you think it wise to do so, but right now I have a state which
crashes on one kernel and works on another one, which seems useful to
not try to fix in case you have some use for it.

Since everything is working fine in 5.9.6 and there were XFS changes
after that, I'm hypothesising that this is probably a bug in the
post-5.9.6 changes rather than anything xfs_repair should be trying to
fix. But I really don't know :)

(I can't help but notice that all these post-5.9.6 XFS changes were
sucked in by Sasha's magic regression-hunting stable-tree AI, which I
thought wasn't meant to happen -- but I've not been watching closely,
and if you changed your minds after the LWN article went in I won't have
seen it.)

^ permalink raw reply	[flat|nested] 3+ messages in thread