linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Replacing the external log device
@ 2023-04-12  7:21 Ansgar Esztermann-Kirchner
  2023-04-12 12:08 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Ansgar Esztermann-Kirchner @ 2023-04-12  7:21 UTC (permalink / raw)
  To: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 3351 bytes --]

Hello List,

what should I expect when I replace the device that contains my xfs
log? Is there a specific procedure to follow? Also, did the expected
behaviour change at some point (in kernel history)?

Some background:
When I joined my current employer in 2006, I performed some benchmarks
to see which FS would provide the best performace for our workload:
multiple NFS clients appending to large (multi GB) files. XFS was the
clear winner, so since then, we have several dozen workstations with
XFS on mdraid (with LVM inbetween). For performance reasons, we keep 
the log on a partion of the SSD that also holds the OS.
In all those years, the only data loss I can remember was caused by a
flaky controller that threw out disks from a RAID6 faster than they 
could be rebuilt.

When a user leaves us but their data should still be kept online, we
move the HDDs to a disk array connected to a special fileserver. The
workstation can then get a fresh install and be used for someone else.
On the fileserver, for every set of disks addedm we create a new LV 
in a VG dedicated to XFS logs and use that to mount the FS.
That, too, has never posed any problems (except for a duplicate UUID at
some point, but that was easily fixed with xfs_db).

However, I've been bitten by a nasty problem twice in recent weeks: in
the first instance, I wanted to replace a bunch of disks in a machine
(something like 4x10TB to 4x16TB). Usually, we do that by setting up a
new machine, rsyncing all the data, and then swap the machines. In
this instance, I refrained from swapping the machines (due to lack of
hardware), and merely swapped the disks. Initially, the kernel refused
to mount the new disks (this was expected: the UUID of the log was
incorrect, as I only swapped the HDDs, not the log device). I called
xfs_repair to fix that. xfs_repair completed successfully, and the 
only modification reported was reformatting the log. However, the
kernel still refused to mount the file system ("structure needs
cleaning"), and a second run of xfs_repair reported hundreds of
problems. It managed to repair them all, but afterwards, the file
system was empty.  I started over, this time calling xfs_repair -L,
but the results were the same.
The hardware, kernel version, and Linux distribution were exactly the
same on both machines. 
At the time, I thought maybe there was a strange bug in that (quite
old) kernel (4.12.14 from opensuse 15.1), so I resorted to waiting for
new hardware and setting up a fresh machine.

Yesterday, I did a Linux upgrade for a different user. After a clean
shutdown, I wiped the SSD (including the XFS log) and re-imaged it
with an up-to-date opensuse install. Afterwards, everything went as
described above. 

I find this extremely puzzling (especially since we've been moving
disks like this to our file server more than a dozen times, all without
any problems, and I fail to see what is different there).

I'd be happy for an explanation of what can happen to damage the FS in
this scenario -- just out of curiosity -- but of course, any steps I
can take to keep the FS intact during this procedure are also very
welcome.

Thank you,

A.
-- 
Ansgar Esztermann
Sysadmin Dep. Theoretical and Computational Biophysics
https://www.mpinat.mpg.de/person/11315/3883774

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4945 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Replacing the external log device
  2023-04-12  7:21 Replacing the external log device Ansgar Esztermann-Kirchner
@ 2023-04-12 12:08 ` Christoph Hellwig
  2023-04-12 12:53   ` Ansgar Esztermann-Kirchner
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2023-04-12 12:08 UTC (permalink / raw)
  To: linux-xfs

On Wed, Apr 12, 2023 at 09:21:32AM +0200, Ansgar Esztermann-Kirchner wrote:
> However, I've been bitten by a nasty problem twice in recent weeks: in
> the first instance, I wanted to replace a bunch of disks in a machine
> (something like 4x10TB to 4x16TB). Usually, we do that by setting up a
> new machine, rsyncing all the data, and then swap the machines. In
> this instance, I refrained from swapping the machines (due to lack of
> hardware), and merely swapped the disks. Initially, the kernel refused
> to mount the new disks (this was expected: the UUID of the log was
> incorrect, as I only swapped the HDDs, not the log device).

Let me restate that:  you created a new XFS file system, but then tried
to reuse an existing log device for it?

How did you format the new file system?  XFS either expects and internal
log, or a log device?  For the above error it must have been formatted
with a different external log?  And then you just switched the mount
option to the log device of the previous file system?

If so that can't work, and I'm surprised you got so far.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Replacing the external log device
  2023-04-12 12:08 ` Christoph Hellwig
@ 2023-04-12 12:53   ` Ansgar Esztermann-Kirchner
  0 siblings, 0 replies; 3+ messages in thread
From: Ansgar Esztermann-Kirchner @ 2023-04-12 12:53 UTC (permalink / raw)
  To: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1098 bytes --]

On Wed, Apr 12, 2023 at 05:08:56AM -0700, Christoph Hellwig wrote:
> Let me restate that:  you created a new XFS file system, but then tried
> to reuse an existing log device for it?

Yes (more or less, as the existing log device should be reformatted by
xfs_repair).
 
> How did you format the new file system?  XFS either expects and internal
> log, or a log device?  For the above error it must have been formatted
> with a different external log?  

Yes, that's correct.

> And then you just switched the mount
> option to the log device of the previous file system?

I physically replaced the disks.

> If so that can't work, and I'm surprised you got so far.

Hmm. Does that mean a zeroed log is still different from one that has
been freshly created? If that is true, then that would be a difference
from the "working" and "not working" cases.
Or do you mean that a log device cannot be replaced even if it is
physically damaged?

A.

-- 
Ansgar Esztermann
Sysadmin Dep. Theoretical and Computational Biophysics
https://www.mpinat.mpg.de/person/11315/3883774

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4945 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-04-12 13:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-12  7:21 Replacing the external log device Ansgar Esztermann-Kirchner
2023-04-12 12:08 ` Christoph Hellwig
2023-04-12 12:53   ` Ansgar Esztermann-Kirchner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).