All of lore.kernel.org
 help / color / mirror / Atom feed
* dm-integrity + mdadm + btrfs = no journal?
@ 2019-01-29 23:15 Hans van Kranenburg
  2019-01-30  1:02 ` Chris Murphy
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hans van Kranenburg @ 2019-01-29 23:15 UTC (permalink / raw)
  To: linux-btrfs

Hi,

Thought experiment time...

I have an HP z820 workstation here (with ECC memory, yay!) and 4x250G
10k SAS disks (and some spare disks). It's donated hardware, and I'm
going to use it to replace the current server in the office of a
non-profit organization (so it's not work stuff this time).

The machine is going to run Debian/Xen and a few virtual machines
(current one also does, but the hardware is now really starting to fall
apart).

I have been thinking a bit how to (re)organize disk storage in this
scenario.

1. Let's use btrfs everywhere. \:D/
2. For running Xen virtual machines, I prefer block devices on LVM. No
image files, no btrfs-on-btrfs etc...
3. Oh, and there's also 1 MS Windows VM that will be in the mix.

Obviously I can't start using multi-device btrfs in each and every
virtual machine (a big pile of horror when one disk dies or starts
misbehaving).

So, what I was thinking of is:

* Use dm-integrity on partitions on the individual disks
* Use mdadm RAID10 on top (which is then able to repair bitrot)
* Use LVM on top
* Etc...

For all of the filesystems, I would be doing backups to a remote
location outside of the building with send/receive.

The Windows VM will be an image file on a btrfs filesystem in the Xen
dom0. It's idle most of the time, and I think cow+autodefrag can easily
handle it. I'd like to be able to take snapshots of it which can be sent
to a remote location.

Now, to finally throw in the big question: If I use btrfs everywhere,
can I run dm-integrity without a journal?

As far as I can reason about.. I could. As long as there's no 'nocow'
happening, the only thing that needs to happen correctly is superblock
writes, right?

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 10+ messages in thread
* dm-integrity + mdadm + btrfs = no journal?
@ 2018-11-04 22:55 Hans van Kranenburg
  0 siblings, 0 replies; 10+ messages in thread
From: Hans van Kranenburg @ 2018-11-04 22:55 UTC (permalink / raw)
  To: dm-devel

Hi dm-devel list,

I have a question, or actually want to share a thought experiment about
using the stuff mentioned in the title of the post, and I'm looking for
feedback on it that either sounds like "Yes, you got it right, you can
do this." or "Nope, don't do this, you're missing the fact that XYZ".

---- >8 ----

The use case here is running a linux server in the office of a
non-profit organization that I support in my free time. The hardware is
a donated HP z820 workstation (with ECC memory, yay) and 4x250G 10k SAS
disks.

The machine will run a bunch of Xen virtual machines. There are no
applications which demand particularly high disk write/read performance.

Encryption of storage is not a requirement. Well, the conflicting
requirement is that after power loss etc. the machine has to be able to
fully boot itself again without intervention from a sysadmin.

But, reliability is important. And that's why I was thinking about
starting to use dm-integrity in combination with mdadm raid to get some
self-healing bitrot repair capability.

The 4 disks would have a mdadm raid1 /boot on the first two disks and
then on each disk a partition and mdadm raid10 with dm-integrity
underneath it, separately for each disk. On top of the raid10 goes LVM,
with logical volumes for the Xen virtual machines.

---- >8 ----

Now, my question is:

  If I'm using btrfs as filesystem for all the disks in this lvm volume
group, can I then run dm-integrity without journal?

Btrfs never overwrites data in place. It writes changes to a new place,
and commits transactions while writing the btrfs superblock, switching
visiblity after crash to new metadata and data. Then only during the
following transaction it allows the disk blocks that were already freed
in the previous transaction to be overwritten again.

The only dangerous thing that remains here I guess is writing the btrfs
superblock to mdadm raid10. If this fails on every disk it's ending up
at with inconsistency between data and metadata on dm-integrity level, I
guess I'm screwed. So, a crash / power loss at exactly the moment
between both writes and for both disks...

So the question in this case is... is the chance of this happening lower
than the chance of some old sas disk presenting bitrot back to linux,
and having mdadm use that and slowly cause vague errors?

---- >8 ----

Or maybe I'm missing something else entirely. I'm here to learn. :)

---- >8 ----

Additional question:

Section 4.4 "Recovery on Write Failure" mentions:
  "A device must provide atomic updating of both data and metadata. A
situation in which one part is written to media while another part
failed must not occur. Furthermore, metadata sectors are packed with
tags for multiple sectors; thus, a write failure must not cause an
integrity validation failure for other sectors."

I don't fully understand the last part. Can non-journalled dm-integrity
result in 'integrity validation failure for other sectors'? Which
sectors are those other sectors? Are they sectors that were written
earlier and are not touched during the current write? Is this a similar
thing to the RAID56 write hole? From what I read and understand so far,
that seems to not be the case, since the lost dm-integrity metadata
write would only cause IO errors for the newly added data?

Thanks a lot in advance,
Hans

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-01-30 16:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-29 23:15 dm-integrity + mdadm + btrfs = no journal? Hans van Kranenburg
2019-01-30  1:02 ` Chris Murphy
2019-01-30  8:42 ` Roman Mamedov
2019-01-30 12:58 ` Austin S. Hemmelgarn
2019-01-30 15:26   ` Christoph Anton Mitterer
2019-01-30 16:00     ` Austin S. Hemmelgarn
2019-01-30 16:31       ` Christoph Anton Mitterer
2019-01-30 16:38     ` Hans van Kranenburg
2019-01-30 16:56       ` Hans van Kranenburg
  -- strict thread matches above, loose matches on Subject: below --
2018-11-04 22:55 Hans van Kranenburg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.